DOCUMENT BESDHE 



ED 126 0*6 



95 



SP 010 209 



&UT.HCB , 
TITLE 



INSTITUTION 
SPONS AGENCY 

o 

BEPOET NO 
POB DATE 
NOTE. • * 

EDES PBICE 
DESCRIPTORS 



IDENTIFIERS 



&BSTSACT 



Follettie, Joseph F. . 

within and Beyond the Formative and the Summative: An 
Evaluation Perspective for large-scale Educational . ' 
BSD. • . ' . . • •' 1 • 

Southwest Regional laboratory for Educational 
Besearch and Development, Los Alamitos, Calif. "* 
National Inst, of Education (DHE.W) , Washington, 
D.Ci- * 
SWBL-PP-23 

16 Feb 73 , 
47p. 

MF^$0,d3 HC-$2.06 Plus Postage* * , 
Decision 'Baking; *Educational Development; 
♦Educational Research; Educational Resources; 
Evaluation Criteria; *Evaluation Methods; *Pormative 
Evaluation; Productivity; School Services; *Summative 
Evaluation ' * \ 
♦Social Indicators i 0 y a ^ 



- - This paper schematizes large-scale educational 

research and development, (R&D) as a progression of operations and 
presents a perspective, for evaluating those* operations and their 
outputs. Host perspectives thus far presented for evaluation of 
educational ESD are °oriented* to small-scale operations and modest 
products* Prevailing, views of formative ^aad summative evaluation, as 
developed by Striven, are analyzed in terms of the state, of the art « 
for use of social indicators in isolating first -order and 1 
higher-order program effects* Implications of the perspective for 
educational- policy, R5D # and the full-service school are presented. 
Major dimensions of an evaluation perspective are examined along with 
organizational anjd individual roles in improving productivity* So?me 
of the chapters characterize the complex educational product and 
cause-effect progressions pertinent to complex evaluations. It is 
concluded that Independent evaluation seems required for all * 
evaluations conducted for a sponsor* 1 The best interest of a 
development organization will^be served by independent evaluators 
working under contract. There will not be any other kind of 
evaluation of higher-order effects until a system of social 
indicators is developed, evaluated, and appropriately 1 
institutionalized* (SK) * * ° 



********************************* *************^ 

* Documents acquired by ERIC include many informal unpublished * 

* materials not available from other sources. EBZC* makes every effort * 

* to obtain the best copy available* Nevertheless , items of marginal * 

* reproducibility are often encountered and this affects tbQ quality * 
*.pf the microfiche and hardcopy reproductions ERIC makes available , * 

* via the ERIC Document Reproduction Service (EDRS)* . EDRS is not * 

* responsible for the q.uality of the original document* Reprbductions * 

* supplied by EDRS are the best that ^can be made from the original, ' * 
************************************ ****** ************ ****************** 




o 
uu 



SWRI, 



SWRL EDUCATIONAL RESEARCH AND DEVELOPMENT 




ERJC 



Within and Beyond the Formative and the 
Summative:- An Evaluation Perspective 
for Xarge-Scale Educational R&D 



10 February 1973 Professional 23 



ERIC 



SWRL EDUCATIONAL RESEARCH AND DEVELOPMENT, 4665 Lampson Avenue, Los Alamitos, Calif. 90720. 
Published by SWRi Educational Research and Development, a public agency supported as a regional educational laboratory by 

O'mds from the National institute of Education (NIE), Department of Htaljh, Education, and Welfare. The opinions expressed in 
- r »is publication do not necessarily reflect the position of N IE, and no official endorsement by NIE should be inferred. 



1 




SWRL EDUCATIONAL RESEARCH AND DEVELOPMENT x 



Professional Paper 23 



Fejyua$ 1973 



WITHIN AND BEYOND THE FORMATIVE AND TOE SUMMATIVE: AN EVALUATION 
PERSPECTIVE FOR LARGE-SCALE EDUCATIONAL R&D 



I 



Joseph F. Follettie 



f 



ABSTRACT 



Prevailing views of formative and suranative evaluation are analyzed, 
in terms of the state-of fthe-art for use of social indicators- in isolating 
first-order and higher-order program effects. Implications of the per- 
spective for education policy, R&D, and the full-service school are 
presented. \ 



4 



CONTENTS 



I INTRODUCTION 

Implications of the Scale of R&D 

The 'Scale for- Typological Complexity of R&D Evaluations 
II MAJOR DIMENSIONS OF AN EVALUATION PERSPECTIVE 
. Level of 'Product Complexity 
Level of Product Maturation j 
Category of Decision-Maker 
Level of Interest in Cause-Effect 
Category of the Comparison Standard- 
Ill SOME BROADER ISSUES 

Perspectives 1 ^ Higher-Order Effects 

' Organizational and Individual Roles in Improving 
. Productivity * * 

.IV CAUSE-EFFECT STRUCTURE 

School Services ^ 

•' Differentiation of Effects 

Differentiation of Antecedents 

o 

Exemplars of Lower-Ord^r Effects. Evaluations 
V ENDS AND MEANS A \ 

Is - J 

A Decision Perspective for Service Productivity 



Development-Rvaluation..,Contracting Procedure 



VI CONGLUDING REMARKS 
REFERENCES 



\ 



Page 
1- 
1 
2 
7 
7 
9 
15 
16 
17 
19 
19 
22 

, 25 
25" 
26 
30 
32 
35 
35 
38 

41 
A3 



I 
t 



iii 



9 

ERIC 



O 



^ WITHIN AND BEYOND THE FORMATIVE AND THE SUMMATIVE: AN EVALUATION 
PERSPECTIVE FOR LARGE-SCALE EDUCATIONAL R&D \ ' < 

Jos'eph F. Follettie 1 



I 

INTRODUCTION 



IMPLICATIONS OF THE SCALE OF R&D 

This paper schematizes large-scale educational R&D as a progression 
of operations and presents a perspective for evaluating these operations 
and their outputs* It is contended that effectiveness of educational 
R&D should increase with its scale* .However, the intent is not to recom- 
mend ,a pure-form "big R&D" as an alternative to a pure-form "little R&D/ 1 
The requirement appears rather to substitute a "mixed 11 economy for the 
laissez-faire "little R&D" economy. While this already is happening, its 
implications for evaluation of educational R&D as yet are underperceived. 

i 

Whether by design or oversight, most perspectives thus far presented* 
for evaluation of educational R&D are oriented to small-scale operation^ 
and, modest products. Perhaps the most extreme manifestation Of this 
tendency yet to appear is the work of B^oom et. al. (1971), wherein the 
classroom teacher becomes a one-person R&D organization who develops and 
1 evaluates limited educational routines in the classroom situation, with; . 
students simultaneously cast in/ the roles of learner and guinea pig. 
Approaches to evaluation that assume a more complex product that is 
amenable to study within the framework of a v multivariate research design 
also appear in the literature— cf, Scriven (1267), Siegel & Sifegel 
(1967), Stephens (1967), Light & Smith (1970). However, even these 
more concessionary contributions to, large-scale operations addressing 
complex educational products are largely silent on the important pro- 
duct genesis operations and tend to view the R&D process simply as 
"develop and evaluate," whether on a one-time or repeated basis. 

r 

However drastic, educationally-referenced efforts that seWk to 
reengineer only one or a few situatipnal characteristics seldom will ' 
appreciably ameliorate a prevailing education that suffers in relevance 
. and is underproductive. By definition, educational R&D does notlhave 
access to every antecedent underlying educational^ef f ects. Ittc^nnot 
secure improved prepaxtum and postpartum care. Nor can .it otherwise 
in the shorter term appreciably influence the preschooler before he 



ERIC 



o 6 



2 



enters school. Yet an appreciable subset ~* of pertinent antecedents to 
educational effects are accessible to educational R&D. Every character- 
istic of educational structure and function potentially is accessible to I 
redesign. Educational R&D probably cannot reach its potential Jevel of 
effectiveness unless predisposed to address a wide range of accessible 
determinants of educational characteristics. This R&D should prove com- 
. ~plex and so, costly. To insure that the investment is relevant and that 
) it yields a productive return, such R&D will require progressive evalua- 
tion .over extended time. These considerations suggest appropriateness~~cTf 
what Price (1963) has called "big science" or, strictly Speaking, of an 
analogy to it — "big educational R&D." Yet the evidence is; scant that we 
have thus far broken the mold of "little educational „R&D." j 

"Little educational R&D" features- a population of isolated academic 
entrepreneurs — individuals or small groups — who employ limited perspectives 
that encompass one or a few of the most accessible situational character- 
istics, ^uch activities tend to be independent and uncoordinated. They 
tend also repetitively to address only that work that the small-scale 
operator finds easiest and cheapest to do jand^to sell. The result is a 
collection of fragments that do not sum to an effective effort. .The 
- special contributions that M big educational R&D" can make. are those of 
. promulgating more synoptic views and of marshalling the organised engi- 
neering efforts that are consonant with these larger views. 

The solution to complex social- problems does not lie in the direc- 
tion of pretending that the problems are simple or that piecemeal 
attacks will suffice. Whether the problem is ecological, socioeconomic, 
or educational, the .reengineering pbrtion of effective responses neces- 
i , " sarily requires large-scale .R&D ♦ The complex educational products that 

•result ^from such efforts address an appreciable portion of the pertinent 
characteristics of an educational system. Such multidimensional designs 
~ *for educational structure and function that are developed over extended 
tirne^ are illustrated elsewhere (Follettie, 1972). The present paper 

describes an. evaluation perspective for large-scale efforts to create 

complex educational products. 

THE SCALE FOR TYPOLOGICAL COMPLEXITY OF R&D EVALUATIONS 

Evaluation theorists have long distinguished between two gross 
categories of educational R&D evaluative effort. The first, "prelimi- 
nary evaluation, 11 JLs conducted when product structure is fluid and so 
modifiable. Its purpose typically is described as "product improvement. 11 
The second, "terminal evaluation," is conducted aftier the product 
reaches "final form. 11 * Its purpose typically is described in terms of 
reaching decisions on "product worth." Scriven (1967, p* 43) is gener- 
ally credited with supplying the terms formative and summative that 
are now widely used to refer to these categories of evaluation. 

Like many dichotomies that ace useful in the abstract, the for^ 
ma tive- summative distinction has pro'ved ambiguous at morel concrete levels. 



! ' 



ERIC 



The terms are used underdiscriminately in the evaluation literature, and, 
too often, with only honorific or pejorative meaning. However, it is , 
possible to ground the forma tive-summative distinction on a conceptual 
network that better reflects a characteristic. and consistent usage. 
This paper first sketches this multidimensional network. Then it sets 
forth categories of evaluation consonant with the conceptual network and 
technical state-of-the-art cfor educational R&D, The degree to which the 
primary Categories that emerge "really" reflect the formative and sum- 
mative labels that are retained throughout the paper may be debated by j 
sophists 1 and Platonists, but is outside the concern of ,the paper. 

As currently used, the f ormative-summative distinction oversimpli- 
fies the progression of decisions that evaluation of educatipnal R&D 
efforts' nmst serve when the product is complex and costly. The dichotomy 
was formulated to apply to simple products such as a textbook or other 
limited educational routine that is to be substituted for one facet of 
x a school whose' structure qnd function are for the most part untouched by 
the product. Such simple products epitomize the ambitions of "little 
educational R&D." ' ' ■ 

* ■ 

R&D programs that are organized to address the^ full range of issues 
requiring resolution in education implicate products that are more com- 
plex and costly than 3uch simple products. Moreover, the more complex 
products mature over extended time and so have a greater potential for 
remaining invisible to the public or to* the .R&D sponbor during' longer 
periods used to formulate, develop^, and opera tionalize them than do the 
simple products. For these and other reasons efforts that yield com- 
plex educational products should require more frequent evaluations to 
serve more decisions than do the plug-in products that ground prevailing/ 
evaluation perspectives.* * * h 

Simple educational products give rise to the view of educational 
R&D as a linear two-stage process whose first stags, product develop- 
ment, invites classic formative evaluation and whose second stage, / 
product evaluation, invites classic summative evaluation'. This view / 
casts aH questions relating to genesis of product specifications in / 
limbo and so invites an h evaluator" first *to impute genesis decisions 
to the development organization and second to frame actions that ferret 
out idiosyncratic biases of the development organization as these show 
up in, the productT^ Jn a ser ies of unpublished but widely circulated 
notes prepared for N1E in 1971-72 to which we will refer' extensively, 
Scriven charts such a course. Only as Scriven's perceptions reference 
to "little educational R&D" that is grounded on the maxim "Every man 
for himself" can he be said to.be on target. , . 

There is general agreement that all sorts of cut-and-paste opera- 
tions that serve product ^improvement objectives may be required duritjg 
product development. Decisions to modify the developing- product to 
serve such objectives typically are reached on evidence afforded by 
formative evaluation in its classic sense. These evaluations cannot . 



4 



i 

occur until the product or some portion of it reaches a form wherein 
it canibe applied to a student so that first-order 'effects on the stu- 
dent can then be evaluated* 

o 

During an earlier product formulation phase of the product develop- 
ment effort (or even antedating it in the sense that the sponsor is 
able to sp^cfjy product characteristics) ,' one must specify the domain 
of product first-order effects (e*g., beginning reading) and the pro- 
ficiency or behavior dimensions along which first-order effects will bte 
-evaluated (e.g., decoding English tabnosyllables of specified novelty 
from print to speech, decoding polysyllabic words of specified novelty 
in light of applicability of mdrphophonemic rules, supplying appropriate 
intonation patterns to sentences during decoding). Someone also must 
specify the a educational cost constraints that will apply. If the R&D 
investment is to be protected, it is necessary also that the develop- 
ment organization specify and the sponsor have evaluated those student 
transit rates along specified proficiency or behavior dimensions that 
applicable states-of-the-art warrant (Follettie, 1972). Concerns over 
monolithic or self-serving "big educational R&D" generally reference 
to such product formulation activities and particularly to specification 
of ,domains*' and dimensions.^ 

° 

Each product formulation activity is capable of independent evalua- 
tion to confirm for the sponsor that the progressing product development 
operation has social promise and that the sponsor is receiving value on 

i 



lff Big educational R&D" is not incompatible with the proposition 
that both educational and educational R&D enterprises need to be made 
more democratic. These enterprises will underserve society to the - 
extent th£t° they are arbitrary and oriented to a self-serving status 
*quo. It appears untenable that we can increase the democracy of these 
enterprises by giving carte blanche either to enterprise personnel or 
to parents and students. Somewhere between the extremes of an auto- 
cratic establishment that reserves all judgments to itself and an - 
anarchistic one that thrusts all of these judgments on parents and 
students should. lie a social contract that is tenable for educational 
R&D* The view of a mixed educational R&D economy that seeks seriously 
to ameliorate profound educational .problems must attack a^ variety of 
status quo practices^ • No one group of individuals — professors, R&D 
personnel, government officials, school personnel, or parents and 
students--can hope to right the problems of prevailing education while 
working in isolation. All such groups probably could contribute to a 
greater extent than they have. We t advocate large-scale R&D operations 
because one cannot hope to tyring the different jurisdictions and inter- 
ests into common effective cause unless they < interact within a shared 
framework that disciplines and focuses the different points of view. 



9 



<? 



the investment. Each such evaluation either iiight cause the sponsor to 
accept progress to that point o^ to require modification of the for- 
mulation as the- condition of continued funding. Thetfe is little place 
for 0 such' a scheme iii classical views on eyaluation during product 
development. Following the classical views, the sponsor cannot hope to 
. shut the barn door until after the horse has escaped. The classic vieW 
encourages elitist social -planning—whether by f under or developer — . 
that critics of "big e<hicational R&D"— e.g., Atkin & Grotelueschen (1971) 
rightfully decry". 1 i I 

When the investment is small, the sponsor will have little incen- 
tive to fund "the progression of investment insurance evaluations alluded 
to above*. However, when the investment is large, then economic con- 
siderations alone should compel that the sponsor sign off or signify 
.displeasure- at each of a progression of critical points during product 
formulation and development. The member evaluations of such* a pro- 
gression each will have -summative implications for the sponsor and 
formative implications for the formulation-development organization^) t 
However, these Ruminative implications will not Be those of classical 
summative evaluation,' which reference to a product in "final form," 
father, these will be implications of a redefined summativ'e category * 
that references to a progression of "summative" entities, where only 
the last few members of the progression are, product entities in the 
classical summative sense. One tries to excise malignancies early, 
because the odds are not good, that the patient can be saved if these ' 
growths are allowed to^reach terminal multiplication. 

It is also an oversimplification, when the product is costly and 
complex, to view educational R&D as culminating in a product evaluation 
stage that permits only classical sunpative evaluation* One can afford 
to restrict one's options to accepting or rejecting a cheaply-developed 
item of any sort. However, we have only to look at the firms that* 
develop, manufacture, and sell the complex systems t£iat power and guide 
contemporary industry and facilitate modern commerce *to see that this 
range of choices is too narrow, when the product. is complex and costly. 
Computer systems typically malperform in minor ways wtjen initially 
installed in an operating setting. Computer firms would be out of 
business if they did not have the option of making the initial ly-mal- 
performing system right following installation. The economics of large 
investments in educational j R&D should compel that product evaluation 
have summative implications for the sponsor and formative implications 
for the development organization.. However, ! these formative implications 
will not be those of, classical formative evaluation that references a 
product not yet ready for evaluation in the^operating setting.. This is 
a matter of augmenting Light & Smith's (1970) emphasis on selecting 
thfe best products or components with an emphasis on tinkering with 
gopd products to make them best buys in the operating setting. 

-\ 

Evaluation of educational R&D for the most part was exclusively 
summative, in the classical sense, prior to a decade or two ago^ 



10 



ERIC 



Particularly among educational research faculties, the scales swung 
toward classical formative evaluation a c decade ago, perhaps as a fall- 
out of the programmed learning movement. Soriven (1967) apparently 
was the first to see the need for both forms of evaluation. The two 
warrant equal billing in his earlier views. Hence we must d£s±inguish 
between the earlier and current Scrivens. 0 

i • 

The primary interest of the current Scriven — reflected in his 
1971-72 npters — centers on summative evaluation. However, the current 
Scriven is less interested in th6 summative evaluations of yore, which 
, addressed first-order product effefcts, than in summative evaluations 
dealing with higher-order effects.j . - 

r * ,The present paper accepts or keeks to extend certain .of the* cur- 
rent Scriven f s views, notably the view that summative evaluations— and 
conceivably all evaluations — conducted by a development staff ^risk the 
biasing of evidence based on conflict of interest. However, the paper 
argues that Scriven f s emphasis on higher-order effects of an educa- * 
tional product is unbalanced and that it is operationally premature 
in light of the knowledge and technology currently available to sup- 
port such evaluation. Tew would djeny, that higher-order effects of 
education are of legitimate concern- to Society. However, attempts to 
identify and demonstrate such effefcts will provte largely ineffective 
until a system exists that defines baselines against which social cause 
and effect can be gauged. A later section of the paper considers this 
matter in light of the views of Bauer (1966) and his associates on the 

need for social indicators. c 1 

o 




\ 



45 



II \ . ■ ' 

MAJOR DIMENSIONS OF AN EVALUATION PERSPECTIVE 
• - ' 1 > ; \ ' 

New perspectives would not be needed if w v e sought only to enshrine 
defective prevailing, practices/ 4 Hence? we neecL not be too concerned 
at this point with the well-documented fact tha^ no agency or combina- 
tion of such has yet emerged to play the central 'role that effective . 
educational R&D requires. Any candidate agency might use the perspec- 
tive* to be presented as a standard against which it; can decide whether 
it will fish or cut bait"; Any other agency might; us^e th6 perspective 
as a standard against which to evaluate the rhetorical productions of 
candidate agencies predisposed to pretend to fisK. \ 

Five major dimensions of an evaluation perspective for educational 
R&t) will be discussed. Taken together, these dimensions are meant to 
be rather exhaustive* However, they show a tendency, as presented, not 
to be mutually exclusive; some covary to a degree. Where covariation 
is appreciable, as between the level of product .complexity and the scale 
of R&D, one dimension is counted although both are discussed* - 

J 

LEVEL OF PRODUCT COMPLEXITY J , * 

The product may be simple or complext Product complexity should 
appreciably implicate size arid complexity of the effort to develop or 
evaluate" the product. Typically, the simple* product will entail 
"little R&D" and the more complex projduct "big R&D. ,f 

; During the era of educational R&D that has heretofore prevailed, 
it has been customary for one or a few individuals — typically witty pub- 
lisher, governmental, or school district backing — to develop a piug-ih 
or chassis-replacement educational product 'in consequence of individual 
perceptions concerning what might sell in the educational market. Such 
a product may have objectives that are the^feame as or, different from 
those of a product that, currently in use in the schodjts, ?would need be 
< removed ta make way for the new product. A product wtifyse\ objectives 
.are similar to those of a product that it seeks to supplant in the 
^ schools is sold on contentions — warranted or ript — that , the new product 
for some reason is more attractive than the old. A product whose 
objectives are novel is sold on contentions that it is more\ socially 
relevant th^n competing products. In either case, the new product 
typically is viewed as a chassis replacement fqr an existing product. 
Its Installation typically should minimally disrupt existing^ structure- 
function of the schools — or status quo — which,, for the mostnpartv'the 
new product will leave intact. i 

When we viet/ the product so* then the position of Atkin & 
Grotelueschen (1971) follows that the> teacher— like it or not— is the 
final decision-maker concerning what goes on in the classroom. The 
position follows because a product that leaves the status quo of edu- 

12 



9 



ERIC 



t 



A- 



11 

taction appreciably intact cannot hope to do such things as open 



/ 

/ 



the 



dooi that, closed, transforms .the classroom into an inner sancttm. 
Simple educational -products will continue to be developed and marketed. 
Some of them will undoubtedly contribute to more pertinent and pro- 
ductive education. However,, any "argument that products must be simple* 
consonant with serving the status quo because-n-like it or not — the 
status quo must be servect cannot be attuned to the same level of 1 
educational 1 disaster that; this paper perceives* 

The p lug- ity product orientation is, analogous to 'that for the home 
equipment enterprise. A host of independent entrepreneur! forces mold 
thehome by molding public opinion concerning what constitutes progress — 
a new gadget (TV), a new wrinkle (color TV), or a new invitation to 
optimize idleness (an elfectric can-?opener) • The process of cihange is 
uncoordinated, incremental (if positive) , and return-sensitive (whether 
return is defined on prestige or something more tangible) • So, it is 
with simple educational produces • Th^se typically do not entail com- 
plex, coordinated efforts, are incrementally — rather than comprehen-^ 
sively — 6r£enCed to problems of the schools, and are return-sensitive^ 
(at best) rather than cost-return-sensitive. ' \1 

One must acknowledge the views that social 0 programs presently can 
be designed only is incremental responses to immediate crisis (cf, 
Braybrooke & iJindblom, 1963) and that educational cost-return concerns 
are premature. Still, it is presently easy to achieve an increase in 
comprehensiveness of orientation to educational improvement, if only 
because level of ambition heretofore has. been so low. Quantification 
of educational cost and return alike pose problems. Nevertheless . 
however cautionary the views of contemporary measurement theorists, 
school bond elections and the budgeting pra\?tic£s of government alike 

^.suggest that thk era of educational »p.igs-4h-pokes has ended. Whether 
the product is bimple qr complex, it is increasingly likely that pro- 
duct underwriters will want (to kriow what the product^ will do and at 
what operating ; costs. Appeals to' prematurity increasingly will fall on 

./ieaf ears. " \ \ ' " " f . 

' The labels we commonly- use to characterize an educational product-- 
treatment , programV product^ rtend to trivialize the product for com 21 
plexity. . If we take the full-service school as the locus of lower-order 
product effects, then one may view one or more of the school's services 
as a complex /educational product. A multiyear service then becomes of 
interest in its entirety as a structure that transits" a student from a * 
first-year entry to a last^year exit. Alternatively, the complex pro- 
duct can be jviewed as a cross-service entity having functions that are 
in support of several services of the full-service school. Such a school 
is not a single model sfihobl pr experimental school. It is any school 
that *qf fers^a full line of instructional, enrichment, 0 and child care- J 

socialization services. * * . 4 

* j \ m > * 

When /we .elevate product complexity to that of a service or cross- 
's ervice component of the full-service schpol or, ultimately, to the 



levex of the full-service school itself, the prevailing evaluation view 
must g^ve way to one that is more sensitive to product formulation steps 
and to the role of evaluation in investment protection* 

It is likely that the other dimensions of an evaluation perspective 
to be discussed in this section will be differently valued, depending 
on whether the product to be developed is simple or more complex. Below, 
these dimensions will be discussed primarily from a standpoint of value- 
setting implications when the product takes complex form. ; „ 

i • 

i 

LEVEL OF PRODUCT MATURATION 

Conventional views on educational product fluidity oversimplify the 
decision options that a sponsor will find useful when, a complex product 
is to be developed and evaluated, the conventional view is that the 
sponsor initiates a product development effort that is generally char- 
acterized — e.g., as improvement of K-3 reading — and that the sponsor 
thereafter monitors development operations on an intuitive basis while 
awaiting product delivery. Such a view neatly partitions evaluation 
into a formative phase that antedates product delivery and a summative 
phase that follows prodiict delivery. If this view of sponsorship 
practice .is nearer to fact than to fiction, then practice must be 
changed. For it makes the sponsor less responsive to technical advice 
and to educational relevance 'issues .and less responsible to the sponsor's 
constituency and to development organizations seeking definitive guid- 
ance than it sh<§uld be when costly products that address large educa- 
tional problems are required. <j * 

Consider educational R&D from the standpoint of a sponsor that, 
interacting over time with fan educational product development organiza- 
tion, formulates and develops a desired educational product. The sponsor 
must first decide which of/ the organizations ,that may be available 
should initiate product formulation-. Many consequent decisions of this 
type can be identified. These decisions all turn on prior sponsor 
.efforts tftat evaluate capability of organizations- .based on past per- 
formance. ,We will not further dwell on such decisions here. 

Once an organization that will initiate product formulation is 
identified and oriented to the product domain— presumably on- the basis 
bf rather general specif ications Mev^loped by the Sponsor — a product 
development staff of the organization should proceed to identify pro- 
duct 1 specifications that aire consonant with the general guidance, 
definitive and acceptable to the sponsor. The* effort to formulate 
deiinitive product specifications perhaps would reflect formative eval- 
uations of specifications at different points in the effort (or evalua- 
tion that, conducted by the development staff, seeks to -make the 
specifications ipofe relevant in macroscopic .and r microscopic senses — cf, 
Follettie, 1972). At some point, the effprt to formulate product x 
specifications should be ^ready for independent evaluation that assesses 

* I 




1 



/ 



id' 



the effort's structure of product proficiency dimensions for macroscopic 
and microscopic relevance. 

Evaluation of product specifications is relevance evaluation. Con- 
ducted long before a definitive product exists, such evaluation cdnnot 
hope to extend definitively beyond product first-order effects — that is, 
the direct effects that most concern a development staff. Evaluation of 
product specif ications *could, be viewed as formative if product-referenced, 
in the trivial sense that it seeks to improve the product. However, l . 
when the evaluation is viewed as specifications-referenced, it is sum- 
mative in that it serves a decisiop to accept or reject the set of pro- 
ductspecifications. Conceivably, the evaluation will reference to a 
standard for social values rather than to a comparative framework that 
employs the educational status quo as, the standard. 

It is noteworthy that the same evaluation viewed, as summative from 
the standpoint of a sponsor can be viiewed as formative from the stand- 
point of the development staff that might be required to modify specifi- 
cations in light of an independent evaluation. The notion thatj a 
development f staff conducts formative evaluations and an independent 
evaluation team summative evaluations is an oversimplification. All i 
summative evaluations of complex and expensive educational products 
have the formative overtone. If the thing evaluated is almost but not 
quite- right, then making it right usually will be economically pre- 
ferable to starting anew from scratch* Views that contradict' this 
position are persuasive only in the context of trifling investments in 
educational R&D. t 



i .The structure of product proficiency dimensions accepted bjr the 
sponsor as characterising the domain of apt first-order effectsJ it 
becomes necessary to place contractual standards on the development 
staff concerning the extent to which a student will be transited over 
these dimensions. These standards can be viewed either as criterion m 
proficiency standards, where the investment in operating costs of the 
school and in student time is specified, or as cost-referenced transit 
rate standards. .The distinction is only * terminological. Transit rate 

terminology is used Here. 

*" 

Product specifications might reveal what proficiency levels jbhould 
be taken as encry values and indicate upper bounds for student ana 
school contributions to the costs of transiting students across the 
set of product proficiency dimensions*. The development staff should 
read the applicable states-of-the-art in the context of specified 1 
schqol operating costs and, if available, the experience of prevailing 
education regarding a comparable existing product. In consequence! it 
should reach guesstimates that are preliminary transit rate specifi cations 



for the product. Since the specified student population should prcve 
heterogeneous both for entry skills and for transit rates, transit rate 
specifications should reflect both central tendency and dispersion , 
statistics. At some point in such a development effort, the transit 
specifications should be ready for independent evaluation that judges 



15 



all 



preliminary transit rate values against applicable states-of-the-art. 
This evaluation serves a sponsor decision to accept,, reject, or reqiiire 
modification of transit Specifications. Hence/ the evaluation is sum- 
mative when specifications-referenced. Needless to say, transit rate 
specifications for a new product should exceed those that characterize , 
a corresponding prevailing educational product. However, simply 
exceeding the status quo seems less desirable than exceeding it to the: 
extent that cost-constrained exploitation of applicable states-of-the- 
art makes possible. Both the development staff effort to produce pre- * 
liminary transit rate specifications and the independent evaluation of 
these specifications necessarily , will be intuitively based. The pur- 
pose of evaluation in this instance is to discourage the. development 
staSf from making its work too easy by referencing its- effort to an 
appreciably underproduc tive status quo. 

Preliminary transit rate specifications accepted by the sponsor . 
as consonant with product operating cost provisions and power of 
applicable states-of-the-art , advanced development should ensue. 

During advanced product development, limited tryouts of •facets- of 
the developing product will occur.. Conducted by the development staff, 
these tryouts provide, the earliest empirical basis for deducing- product 
transit rate characteristics. They form a progression .of formative, 
evaluations and modifications that culminate in development of a pro- \ 
duct that a) is characterized by empirically-based provisional transit* ' 
rate specifications and b) is ready for full-scale tryout . The sponsor's 
decision to have a full-scale tryout should stem from evidence,^ gained 
during th£ limited tryouts, that the product is promising for educa-, 
tional productivity. This promise is reflected in provisional transit 
irate specif ications. * m . 



^Findings obtained during limited tryouts condition a decision to ' 
have a full-scale tryout, which may be costly. If one views these cut- 
and-paste-serving tryouts as an informal series that terminates on 
definitive tryouts for isolated portions of the educational product, then 
entertainably" these .terminal members of the series should be viewed as 
summative evaluations conducted, by independent evaluation teams. A pos- 
sible compromise between terminal -limited tryouts conducted exclusively 
by'a development staff that risks conflict of interest and an evaluation 
team that does not is for the development staff to conduct such evalua- 
tions with a technical representative of the sponsor monitoring these 
evaluations closely. } ,The ultimate extension of this point *ofi view treats 
data-ccjllection requirements generated by all scientists, engineers, 
and other interested individuals as subject to conflicts of interest that 
may, at minimum, distort, perception and so bias findings. .There is 
something to be said for this, and the advocates of single-blipd and 
double-blipd studies have said it.' However^ at some point in the 
effort to eliminate conflict jd5 ^interest,, practical 'considerations 
intrude, and one is forced either to accept some .capacity for honest 
appraisal or to create an unmanageable system whereby the police who 
police the police are themselves policed, ad .infinitum. 



Independent evaluation during a full-scale trybiit establishes 
tenability' of provisional transit rate specifications and serves an 
agency decision to install the product on a probationary Basis — pro- 
bationary installation . , The full-scale tryout is the fir^t of a series 
*of whole-product-referenced summative evaluations* However, its 
findings might suggest limited product modifications that j should occur 
prior to .probationary installation. Moreover, because thp full-scale 
tryout situation will not be isomorphic with the operating-school situa- v 
tion that the product is designed to accommodate,, findings might suggest 
how transit rate specifications should "be modified to adjusted pro- 
visional transit rate specifications'* , which will be used 'during the 
earliest portion of the probationary installation periodhjfeo*. eyaluate 
both product and performance of the schools with regard tjo the product 
(see Follettie, 1972). Thus, when viewed from the standpoint of a pro- 
duct . development staff, summative evaluation again takes k on formative 
evaluation coloration* 

Development staffs and educational evaluation theorists alike have 
tended to view the total development effort reported up to this point 
aSj one to which formative evaluation -is applicable but not summative 
evaluation. One can understand how this view could arise in the cli-'-' 
• mate of an unregulated free R&D market and its dictum that any notion 
is a good one that sells (even for a few seasons).* However, the con- 
tinuation of this orientation to .evaluation when costly complex edu- 
cational products are to be developed promises to be much tod expensive 
and Wasteful. to perpetuate. A sponsor should not take an extended 
' cos fclyy ride over the route sketched, above without assuring itself 
along tthe way that social need is Joeing served and that early promise 
is maturing into something more tangible. Scriven is correct' that 
verbal evidence supplied by a product development organization, will not 
always be disinterested. However, his responses to the problem of con- 
flict of interest seem half measures at best. His outside or \goalT 
free formative' evaluation mislabels a progression of evaluations ttlat 
always can be viewed as summative if properly ref et;ence.d . 

The alternative view presented above saddles the sponsor with 
responsibilities that have v not heretofore been acknowledged. The \ 
view of a progression of summative evaluations throughout product * 
development requires any sponsor that acfs as the central nervous 
system for "big educational R&D 11 operations to lead, whereas candidate^ 
to sponsorship roles heretofore have been content r %o advise and cons en t\ 

Additional and important other product 'development and evaluation \ 
efforts lie beyond probationary installation--or within a probationary 
operation period . Extended products necessitate that the probationary*, 
operation period be extended. The customary view of y the formative-sum- 
mative dichotomy entails viewing the period as one wherein only summative 
evaluation occurs. It- is reasonable that first-order product effects. . 
should be definitively summatively^ evaluated during the period. How- 
ever, when the product is complex and extended, more should ocfcur 
duriilg probationary operation than classical summative evaluation. 



13 



The probationary operation period of an R&D sequence for complex 
educational products is a hitherto unrecognized necessity. Its analogue 
is to be seen in all large-scale operations jthat yield complex arti- 
facts and 3 systems f u We explore the period f^rst in the tidier world of 
commerce* 



The manufacturer- of complex hardware svstems--e .g. , computer sys- 
tems — woufd not be^in business iong if not allowed, during a post-, 
installation period, to do whatever is requilred to bring the system up 
to contract specifications. The hardware system manufacturer is con- 
tractually bound to develops system that works as yell in the opera- 
tional settitfg as it does in the factory. The contract is not ful- 
filled when the product is delivered to a reqeivingS rcSom or installed in 
an operating room. Only when its first-ordert effects are demonstrated 
in the operational setting usijig inputs that characterize the setting 
is the contract fulfilled. At that time the manufacture^ secures 
buyer acceptance of the product. 1 1^1 

1 ' I 1 I | 

The manufacturer fir'st demonstrates, in the factoryj setting that the 

system performs according^ tq contract specifications! This test is 
analogous to the full^spale. tryout of; an educational Iprpduct iri that 
first-order effects are! evaluated in a setting, that is siitiiilar to but 
not identical to the operating setting^ The decision! to 1 , install the . 
sSvstem follows from favorable findings obtained in the factory setting. 
During a* probationary period bejg^nnifagl with installatifon^Ad ending 
with buyer acceptance, the ^manufactured makes whatever adjustments may 
be required to cause the \system \to achieve contracted tifstporder 
effects in the operating setting^ If no problems showiup, the pro- 
bationary period may be quite short. However, if initial evaluation 
Reveals substandard system performance, ^ then its cause must \be identi- 
fied and corrective action\ taken .\ It is possible that k sero.es of 
correction-evaluation routl nes will be required— a siarvqmechanistic 
procfess that culminates when contacted performance is a\chieved in the 
operating setting.' Perhaps\ there are instances when costly "hardware 
proves incapable, of aajustment^ to contract specifications in the 
operating setting. Iny'that case, ihe manufacturer ei^then accepts the 
view that the entire efif ort nms£ 1?e written off or returns the\ system 
to the\factory for further development. ^Usually,* the .system will prove 
capable\of adjustment to contract specifications in the operating* set- 
ting; That is, in time most such systems that leave the factory will 
bg-~ accepted by buyers*;' Opmplex educational products waWarft WmAlar \ 
^ treatment^ during a probationary\ period^ and should, also ^in time prove ^ 
^acceptable^to buyers in' light ok p.erf orpance in. tt^e opiating s^eqtipg. u 

\ Summative evaluation rraditronally has signified ha^ds-o|f f e\alua-^ 
tion of the\ educational product f sAlower-order effects, during the period 
we "call the rcrobationary period\. The typical allegation is twab aA ■ 
evaluation in^ the operating setting. during* a' probationary peri^qd\h^fe 
no other purpose than to iridioat^ tl\at ,fche\product is or i§\not\ a\ gc 
buy* This vie\j may be economically tolerable when the p?oduVt % i v s on\ 

A* • A . 




14 



\ 



the order of a textbook* Textbooks are rather cheaply developed and 
some other textbook always is offstage awaiting its turn when a given 
textbook fails* However* we should not allo^ expensive educational 
products to reach' advanced development unless they promise to deliver 
desired lower-order effects, should not allow their general distribution 
and use until i a full-scale itryout strongly suggests that they will per- 
form ii^r to contract specifications 'in the operating setting, and should 
not quit oil them in- the operating setting when minor adjustments will 
cause them to perform according to contract specifications* One cannot 
revive the .hopelessly dead in the operating setting* However, in well- 
managed educational R&D, few products that eventually must /be written 
off ever will reach installation* For all others, the probationary . 
period is conceived here as one that insures that promising investments 
always will be salvaged* According to this view, the product ceases! 
to' be fluid only when it performs consonant with buyer acceptance*' \ 

As educational products increase in o complexity and, cost, it will 
become increasingly necessary to view an initial decision to install 
a product in the schools as probationary. Evaluation teams that are 
independent of the product development effort then might evaluate 
lower-rorder effects of the* product, with findings fed/to the develop-' 1 
ment staff for corrective action when product performance ,f alls below \ 
.contract specifications* With buyer acceptance, * the product reaches a 
form that can be considered final until advances in applicable states^ 
of-the-art, changes in taste-, or evidence of undesirable longer-term 
effects necessitates that the product be modified or supplanted* i % 

i 

The standards on which absolute evaluation during the probationary 
period could be predicated themselves will evolve/ as a progression whose 
first set consists of adjusted provisional transit rate standards and 1 
whose last set "consists of definitively stable transit rate standards * \ 
The. first of these sets stems from a full-scale 'tryout* The modifier i 
"adjusted 11 is used because the empirical .evidence that the tryout 
affords will be based on a situation that differs in several forsee- 
able respects* The modifier reflects application 1 of a guesstimation 
process to^the tryout findings* The adjusted standards might compensate, 
for the faci that a multiyear service is simultaneously installed in 1 
the tryout setting, whereas its design contemplates longitudinal instal-l 
lation* They might also compensate for a product's tendency, under j 
the pressures of ^parallel development, to employ certain componen 
e.g*, new equipment, new occupational, specialties — in prototypic form 1 
during the full-scale tryout* * * ' # 1 

One contemplates a succession of sets of standards to be devised 
during the probationary period less to serve product evaluation require- 
ments than to confirm the requirements . for school personnel • These 
standards must be fair if- .they are to haVejany role in defining and 
securing performance accountability *in the schools* Much technical 
wprk remains to do before agreement cart be reached on a standards- * 
setting perspective* We have no recourse to doing this work unless we 

w- \ / 



19 



15 



are willing to evaluate both product and personnel comparatively—the 
prevailing inapt strategy. If we wartt to ( get the best v obtainable per- 
formance, whether from a product development staff or school personnel, 
then we must have standards that are set as high as is fair. Pro- 
visional or adjusted provisional standards might suffice for product 
evaluation under certain conditions. However, the transit rates that 
these standards reflect typically will be lower than the product war- 
rants when installed in the operating situation. Definition, of a 
progression- off sets of standards is particularly indicated wt\en the 
product is a multiyear ^service, because produe-t-perf ormance should 
improve year by year in the operating setting until all students 
entering th$ higher-year levels of the } service are graduates of the 
lower-year levels. We ask the product to deliver improved performance 
over the years that are required to transit the student from first- 
year entry to last-year exit. And we ask school personnel — who are a 
part of the product to the extent that personnel-training routines are 
effective — to do their share t.o insure that product performance increases 
from^the first to the nth year of the probationary period for the n-year 
service. We cannot make these demands within a comparative evaluation 
framework. We cannot justify them if we treat the problem of standards- 
setting arbitrarily or oversimply. Whether the task is to evaluate the 
product fairly or the personnel that the product implicates in a fair* 
and reasonable way," the probationary period cannot be ;<a hands-off 
period for the product development staff. 



A large-slcale operation to develop a complex educational product 
will feature al progression of operations that classify under product" 
formulation, development, land, evaluation headings.' Each Isuch operation 
can be evaluated. .."Acceptance of its output removes some joptionS con- 
cerning operations that follow. Products become increasingly mature * 
and decreasingly fluid as t/iey move through such a progrebsioh. The 



product is mos 
following buyer 



fluid during formulation stages and minimally fluid 
acceptance'! 

/] 

erts a summative evaluative interest in tne product 



Scriven as; 

-prior to buyer acceptance 1 . 1 However, the summative evaluation domain 
that most concerns Scriven In the NIE notes is that of product higher- 
order effects, ,njost^of whici can be expected to show up only in the 

Ls, years following buyer acpeptaricej. a| later- 
section discussejs the conditions under which one cai| hope definitively 
to evaluate an educational product for higher-order .effects . It is 
not precluded that Scriven f s| interest in evaluating higher-order 
effects is "evaluation" in 1 the. policy science sense, where baselines, 
and effects alike are speculative entities. 



CATEGORY OF DECISION-MAKER 

X [ * 

\ \ * 

The conventional view ofl educational R&D appears tp. distinguish 
between a closed-b\and classical "formative evaluation and ati ppened-mouth 



16 



classical summative evaluation. The classical categories of evaluation 
kaadjto^a view of summative evaluation as decisive rather than infor- 
mative, Tfieronly decision-maker requiring consideration in the classi- 
cal Iframework is theljaye^r-who does or does not buy. This might prove 
satisfactory when simple products are to be developed, but fails to 
propter the investment when a costly complex product is to be developed/ 

* Large-scale operations addressing large educational matters require 
that*both sponsor and development organization reach decisions on the 
basi§ \of open evaluation operations* Evaluation that serves sponsor 
decisions seeks to establish relevance and worth of the development 
effort! to date.' Evaluation that serves development staff, decisions 
seeks to establish where and perhaps hoW product effects must or might 
be enhanced. Essentially, the same evaluation findings ser^ve sponsor 
and development organization categories of decision-makers • However, 
the sponsor se o eks to evaluate the development organization for potential 
;or— achieved productivity, whereas the development, organization seeks to 
evaluate* Tfie ^product (which may include a personnel component) for 
potential or achieved 1 Ipf oductsivity^ Inherent inl both 'objectives is 
the notion of a standard agaipst whTc^the-eiio^^iil be evaluated. 
The .formative- summative distinction traditionally .has been made a 
function lof the level of maturation for simpler educational products. 
It could, pi ternatively be viewed as covarying with decision-making 
category. 



LEVEL OF INTEREST IN CAUSE-EFFECT 

One inky be interested in partial or total lower-order effects of 
a product per se, in lower-order effects of antecedents other than the 
product, oi 
of all oth€ 
effects or 
precluded , l 
appreciably 



in higher-order effects of the product in the context 
r antecedents. . While" a longer-term interest in lower-order 
even a shorter-term interest in higher-order effects is not 
level of interest jin cause-effect typically should covary 

with product maturaticm. 



Cf^ect 



Effects have two ;primary dimensions: locus and time (or delay). 
Effect locus es are viewed from a standpoint of I distance from the pro- 
duct as antecedent" and, so^are defined independently of maturation level 
for the product. However, there can be an effect in a remote locus 



- only if fir's 
locus. 



When 
) order effect 
such effect^ 
To" the exteft 
. product dev<: 
of product n 



t-order^ effects of the product somehow reach the remote 
a product | operates on a .given student to produce first- 
5 on, the student, it should require increasing time for 
to work their^ way out to increasingly remote locuses* 

that delay time does not covary with the time scale for 
opment, l;eyel of interest in cause-effect is independent 
turation; 1 



21 



w 



ERIC 



17 



f j 

An issue raised by Scriven in the NIE notes is whether the forma- * 
tive-sumraative distinction should mirror a distinction between lower- 
and higher-order product effects or, to use Scriven f s terminology, 
between main effects and side effects. We will argue that Scriven's 
interest in higher-order effects of educational products is legitimate 
but that the state-of-the J art for evaluation of social cause and effect 
must advance appreciably, before it becomes possible to evaluate higher- 
order effects more tqhan intuitively. Intuition sometimes cannot be • 
avoided. Where it is necessary, the sponsor would do well to protect 
itself against the possibility that its decisions are based on idio- 
syncratic tendencies of evaluation teams that the sponsor employs. 

* An issue not {raised by Scriven but inherent in- "the view that 
higher-order effects are of legitimate concern to the sponsor is the 
extent to which antecedents other than the product require consideration. 

paramount stumbling block to efforts to date to evaluate .social cause 
and effect is that multiple causes lead to multiple effects. If multiple 
effects are of interest, then it is highly probable that these effects — 
and particularly the higher-order ones--stem in part from antecedents 
other -than an educational product whose evaluation is of central interest. 
Order progressions for antecedents and consequents alike must be con- 
sidered when one seeks to establish cause and effect in a broad social - 
domain. 0 Such progressions for antecedents and consequents that one may 
associate with a specified educational' product are sketched in Section 
IV of the paper. 

* i 
CATEGORY OF THE COMPARISON STANDARD 

All evaluations involve comparing something with a standard. The 
standard may be intuitive or explicit, arbitrary 6r rationally-defined, 
demanding or undemanding. Comparative-relative standards tend to be 
explicit, 'arbitrary, and- undemanding. Criterion-referenced or absolute 
• standards tend to be explicit and rationally-defined; they may be 
demanding or undemanding and should be demanding consonant with opera- 
ting cost constraints imposed on sexploitation of the applicable states- 
of-the-art. 1 Where levels of effect are dichotomized into first-order 
and higher-tl}an-f irst-order, then the standard presently must be com- 
parative-relative when higher-than-first-order effects require evalua- 
tion. When first-order effects require evaluation, the standard may 
be either comparative-relative or absolute. 

If one leaves to a development organization or staff all decisions 
concerning how stringent its criterion-referenced product proficiency 
. levels will be, then it is understandable that some will conclude that 
absolute evaluation places a less stringent hurdle in the path of the 
product development _aiaff than comparative evaluation might . If we 
block this loophole by defining suiranative evaluations leading to deci- 
sions by the sponsor concerning the merit of a development staff f s 
views on criterion-referenced product proficiency dimensions and levels 



22 



i 



18 



(or transit rate specifications), then the logic of comparative evalua- 
tion becomes less compelling. Comparative evaluation of first-order 
product effects encourages a product development staff merely to strive 
to exceed the status quo. It encourages staff* to do no irore than build 
a measurably better product when suitably-constrained exploitation of 
applicable states-of-tfie-art should yield a currently best-possible 
product. . Comparative evaluation is an invitation to underachievement. 

\ w • • 

Prior to limited tryouts that administer prototypic portions of 
the product to students, all effects of* a product are only potential. 
If product formulation operations are progressively evaluated as 
sketched above, then it should be possible! to use absolute standards to 
% evaluate product first-order effects during limited and full-scale try- 
outs an<i during probationary operation of the product without risking a 
conflict-of-interest tendency of staff to ask too little of itself. 

Comparative evaluation of social artifacts for first-order effects, 
is the oldest kind of evaluation. Such evaluation appears most apt 
when-, in consequence of using the simple two-stage model for develop- 
ment and evaluation of educational products, no requirement has been 
set forth for evaluating, product , formulation or requiring the product 
to represent a best-possible effort. It is ironic that some now com- 
mend such evaluation as epitomizing sound product evaluation. Most 
products that are :just a little better than those that currently pre- 
vail in the schools would have to be judged not worth fooling with. 

While comparative evaluations of first-order Effects might be 
warranted on occasion, it is difficult — sometimes to the point of 
impossibility — to define an acceptable comparison study. Proponents of 
prevailing education tend to take their first-order gains along intangi- 
ble or fortuitous proficiency dimensions. Critical comparative evalua- 
tions tend to require designers who are as wise, forceful, and persuasive 
.as Solomon. Only when we can agree that given prevailing education 'is 
as socially relevant as it shpuld be doe& a straightforward basis for 
conducting comparative evaluations exist. Only, if the prevailing pro- 
duct then is considered "pretty much attuned to suitably cost-constrained 
applicable states-of-the-art 11 does it become a tellable standard against 
which to evaluate an alternative product. 



SOME BROADER ISSUES 



PERSPECTIVES ON HIGHER-ORDER EFFECTS 

In the NIE notes, Scriven is primarily concerned with higher- 
order effects of educational, "treatments . 11 These he describes as goal- 
free (or needs-bound or consumer-oriented) summat'ive evaluations. Such 
evaluations address side or higher-order effects. However, Scriven 
0 underdescribes .the effects his side effects terminology subsumes. Nor 
do Scriveri's comments on, side effects evaluation go to the heart of the 
problem concerning how one will establish baselines against which side 
effects can be detected and "gauged* If side effects evaluation is to 
be more than a parely intuitive exercise^ then social indicators must 
be provided whose time-referenced series of readings taken prior to 
side effects evaluation form 1 the baseline against which the side effect 
will be detected and its magnitude established. 

The system of general economic indicators has* be^en, under develop- 
ment since the 1930s. While a step beyond nothing, all who know the 
, system agree that it is much liss than sufficient for predicting 
economic effects or characterizing economic cause and . effect . _£conomic 
and educational antecedents enter info a broader domain of social cause 
and effect. Bauer (1966) and his associates have been working toward 
a system of social indicators that can be employed in the b^ader domain 
of social cause and effect. 3 Scriven apparently does not believe that 
his interest* in educational side effects requires him to address the ' 
broader domain. Conversely, Bauer 'believes that higher-tfrder conse- 
quences of large social programs cannot be established unless evaluation 
is antedated by an operational system of social indicators that pro- 
vides firm baselines against which higher-order consequejices can be 
detected. and gauged. He also believes that it" will not 'be possible to/" 
move far beyond evaluation of first-order consequences of a program if 
one must wait for the program to come into being before the evaluation 
effort gives attention to the system of social indicators td-which 
the evaluation of higher-order consequences of the program will be 
baseline-referenced. Thus, he is drawn to the view that the system of 
social indicators should generally reference to social need, rather 
than specially reference, through a specified program, to facets of 
social need. ' » - * , 

Bauer distinguishes between short-terin second-order consequences 
whose social indicators might be specially referenced to the program 



3 Land (1972) overviews more recent efforts to design systems of 
social indicators for use in establishing social change. j 



24 



to be evaluated and longer-term secc 



indicators must be generally referenced to social ne6d if evaluation is 



nd-order consequences whose social 



to be more than an ex post facto fodtnate to history. * Although Bauer's 
longer-term second-order consequences are here treated as a progression 
of higher-order effects, the progression accepts Bauer's posit ion. 
Even considering the perspectives th'at Scriven builds into the apper- 
ceptive masses of his goal-free suriWative evaluators, his conception 
of side effects at best reference only /to the educational portion of c 
social need — and then only if the evaluator is an educational Leonardo • 
Hence, Scriven is constrained either to view long-term educational 
cadse and: effect as identifiable ftrom within less than the larger frame- 
work that is operating on education or to postpone the search until 
appropriate baselines are established, where. the baseline effort is. 
initiated only after the program to be evaluated is identified • One o 
doubts* that Scriven would opt for the first of these Hobson's choices* 
I see no alternative to the second unless. the evaluation team is per- 
mitted to intuit its baselines-r-not a very* giant step forward. 

Baiier distinguishes between the special short-term consequences of 
programs of an agency such as NASA and general longer-term consequerces 
of these programs in the larger context of all* social programs • Bauer* 
special short-term .consequences include first-order effects and more 
immediate higher-order effects that, an agency such °as NASA can antici- 



pate*. In Bauer's view, evaluation- of | longer-term consequences of parti 
cular programs cannot aspire to be timely unless a system of -social 
indicators that is analogous to the existing system of economic indi- 
cators is operational well ahead of tne evaluation team's need to evalu 
' ate long-fcernua£fects of particular programs* His position is that the 
longer- tferqii effects of a particular program are to* be found in an 
,.aggregat*^.social accounting system that is responsive over time both 
to a^p^i^i^ular program and to all other programs that represent social 
action* during the period antedating the search for longer-term effects. 
Bauer and his associates address the problem of creating a social 
accountfeg- system whose dimensions are sufficiently inclusive to bear 
upon longer-term evaluation of diverse social programs and vhose mea- 
sures over time provide a basis — quantitative wher6 possible, -quali- 
tative where necessary — for explicating evaluation of longer-term 
effects. Conversely, Scriven either assumes that such a social account 
ing system presently exists or that ad hoc selection of its pertinent 
dimensions and consequent collectioh of baseline data can occur within 
the timeframe for summative evaluation o^ a specified program* 

\ ? 

Bauer provides a useful first cut on 1 the classification of effects 
one that is implicit in Scriven 's distinction between main (or lower- 
order) and side (or higher-order) effects.^ In addition, Bauer dis- 
tinguishes between short-term and long-term higher-order effects (or, * 
in his terminology j second-order consequences), "thus- preliminarily 
partitioning these effects. Both Bauer ana Scriven' seen* primarily 
interested in long-term higher-order effects. However, the two are 



25 



21 



differently oriented concerning how these effects might be established. 
Such effects do not fall outside this paper 1 s domain. However, it is 
likely "that progress in capability for evaluating long-term effects 
will not occur prior to extensive intellectual and'dollar investments 
of the sort described by Bauer and his associates. If this- is correct, 
then summative evaluation of educational services during the next 
decade or so is not going to do a very convincing job of evaluating 
long-term higher-order effects. It is not tqo early to begin ^trying 
to characterize such effec :ts~mbre explicitly. However, these effort^ 
in the shorter term should much more contribute to the state-of-the-art 
that Bauer sketches than to improving the performance of evaluation 
teams seeking to establish longer-term higher-order effects. 

We may speculate — as is common among practitioners of contemporary 
policy .-science — concerning how existing programs are affecting society - 
outside the realm of planned or anticipated lower-order effects of 
these programs. 1 * Disciplined speculation concerning, such effects prob- 
ably should be stimulated by a sponsor that seeks to evaluate' its 
investment in educational, R&D. While speculation is not evaluation, 
even in Noah Webster's sense of estimation, we might c view the disciplined 
speculation of a policy science as yielding policy science evaluations r 
that contrast with the evaluations of evaluation science.* When we say 
that the state-of-the-art as yet does not permit convincing evaluation 
of higher-order effects of an educationaloproduct, the reference is to 
evaluation science evaluation. Policy science evaluation, grounded as 
it is on a good deal of intuition concerning both inputs and outputs, 
is not precluded. My suspicion is that the side effects evaluation' ' 
that Scriven would have a sponsor fund is policy science evaluation for 
the most part. There is little point in Resisting *this approach. Until 
evaluation science state-o^her^art for ^higher-order effects is appre- 
ciably advanced, the weaker policy science approach to evaluation of 
sucji effects i^ay be all that is available to those who are concerned 
with these effects. * 



Contemporary policy -science antedates l^ong-time operation of a 
'system of social indicators and so is a system for reaching policy 
decisions on the basis of an impoverished information base. At its 
disposal are diverse aggregate social; statistics, some permitting dis- 
aggregation when the occasion requires. Available^ input statistics 
include number of teachers, teacher salaries, teacher-pupil ratios, * 
per student costs, average daily attendance, afcd student distributions * 
of various sorts — e.g., across a socioeconomic .scale;- Output statistics 
'tend to be those that a^e appropriate to evaluating the educational 
institution. as a babysitting service-r-e.g. , years of education, and 
degrees attained. The policy science that might exist several" years , 
following installation of a social report irig system such as Land (1972)* 
advocates conceivably will have appreciable scientific power. The policy 
science discussed here is the one that is presently available, which - 
cannot be better than its information on inputs* and outputs. 



26 



v 



22 



ORGANIZATIONAL AND. INDIVIDUAL ROLES IN IMPROVING PRODUCTIVITY 

/ * Scriven (1967) limited formative-summative distinction to the 
educational R&D context. Blopm'et al\ (1971) atte'mpt to bring the dis- 
tinction into the context of educational practice. The present paper 
^ecogriTzes that a^large amount of educational product development 
currently is done on>n individual pre-industrial basis by the same 
persons who are responsible for rendering the various educational ser- 
vices — the classroom teachers. That classroom- teachers develop and use 
certain materials day-by-day is an invitation that someone was bo'Und to . 
accept to- define .Ep^rmative-summative evaluation categories on operations 
of the classroom teacher . However* if development of a modern mathe- 
matics textbook represents "little educational R&D," then what the 
classroom teacher caivdoVith available resources must be miniscule 
iifdeed. There seems little point in confusing the Scriven and Bloom 
et al. views on formative-summative evaluation, which most clearly 
share only Scriven ! s terminology, 

sMany currently ar.gue that ail product development should be done * 
by the classroom teacher. Such arguments ignore both that the class- 
room can only support "miniscule R&D" and that the larger "little R&D" 
"to which we have referred is inherently limited by comparison with a 
respttmsive and responsible "big R&D," In this light we consider, again 
t]}e position eloquently advocated by Atkin and Grotelueschen (1971), 
From the indisputable premise that the teacher is the f^nal decision- 
maker concerning what goes on behind the closed classroom door they 
derive the -conclusion that large, organized efforts that place teachers 
"at the end of a development/innovation line in which they are expected 
tb implement the 'bright ic^eas of someone else" must fail. They also 
are concerned with counteracting 1 an elitist "social planning that 
assumes that a particularly wise and prestigious group is possessed of 
aa adequate educational. vis]ion to warrant investment of our major 
available resources in an attempt to replicate that vision throughout 
the countryside." These are separable points of view. > 

Centralized versus decentralized social' planning is a false 
dichotomy. The Soviet Union has by now proved the dangers of highly- 
centralized planning in. an elitist government bureaucracy . That 
government even, in the United States is underresponsive to the needs 
and wishes of too many of its citizens in those domains where govern- 
ment's role is paramount i§ well known. Yet gQvernment and associated 
large- segments of private industry continue to grow, as we seek to come 
to grips with complex interrelated 1 antecedents underlying a present • 
imperfect social fabric . In seeking to improve the ^student f s lot within 
an industrial engineering framework fostered by government, we must 
somehow avoid bureaucratic tunnel vision and arbitrariness.'" That does 
not argue that the entrepreneural model wherein hundreds of thousands of 
individuals vie- for th£ir own personal, uncoordinated pieces 'of the 
action--each usually based on a single-dimensional view of problem 
antecedents — can serve society as well as larger schemes that a$e 



responsive to all of the antecedents (and consequences) that are 
germane to' improving the lot of the student and of society. .Whether 
the entrepreneur is a single professor in a -school of education or a 
classroom teacher, we cannot hope to make education much better than 
it is sq long as we continue to view the "problems as resolvable by 
many thousand uncoordinated organizations of the one-man. show variety. 
That does not argue that an occasional Louis Braille or Sequoyah will 
no.t appear from time to time and set large matters straight that great 
educational industries uniformly misperceive. Nor does dt argue that 
we ever should create the ecjucational situation that deprives the 
inspired teacher of the opportunity to do much better than an engineered 
educational service might allow. When a teacher exceeds performance 
standards for a service, then one should agree withAtkin & Grotelueschei 
that we try to determine what it is that such' a teacher does that leads 
to such a consequence. Howler % the Brailles, Sequoyahs, and inspired 
classroom teachers are irrepressible. Like cream, they will rise to. 
the top if some set of standards that is akin to milk is available for 
use in comparative evaluation. A mix of effort at both the level of 
formulating the full-service school and the level of engineering it ds 
in order. It is highly unlikely that such efforts will produce £ 
monolithic educational* vision that we "attempt to replicate. ..through- 
out the countryside." Rather, they most probably will produce an 
inventory of designs that, taking into account the sum of the appli- 
cable knowledge that is now available, promise to be much more complex 
than most individuals operating independently ever could hope to achieve 

However strong the teachers 1 unions grow, it does not appear com- 
pelling that society must accept educational tyranny, whether in the ■ 
classroom or in administrative offices. The closed classroom door to 
which Atkin & Grotelueschen refer is a barrier that few who labor for 
remuneration have been allowed to interpose between themselves and 
those who meet the payroll. Those who defend the closed classroom 
door on grounds ^>f academic freedom would do well to give equal stress 
to the fact that privacy can also be used as a- license to steal. 
Accountability remains a fuzzy notion that masks a variety of motiva- 
tions. Yet we cannot defend two standards — a relaxed one for those 
who invoke professional mystique and a strigent one for those who do 
not. It does not seem reasonable that those who earn a living in 
any professional field should escape provisions of a fair and reason- 
able accountability standard. 
* \ * 

There 4 always will be professional outputs that are £ruly pro- 
fessional because, lying at or beyond the frontiers of codified know- 
ledge, they represent new. discoveries. Most professionals in every 
field would be out of business quickly if their livelihood depended on' 
their operating at this level more than infrequently. Rather, most 
professionals are necessarily technicians mdst^ of the time. .However 
complex, the technical component of professional work can be specified 
and evaluated. Any professional effort that goes beyond the specifi- 
able technical work requirements for doctors, lawyers, teachers, or 



28 . 



24 



educational R&D personnel merits special credit* Such effort yields 
bonuses that go beyond a technical standard* 

If we partition cprofessional effort into technical and professional 
components, then professional carte blanche loses tenability. The 
dilemma of teachers is that they seek the professional carte blanche 
that traditionally has been extended to other professional groups at 
'the very time in history when it is becoming clear that this costly 
privilege must b ( e scaled down in- the ottier groups, v**, 

Atkin & Grotelueschen merely advocate a form of educational R&D 
that is predicated on teacher entrepreneurs* Bioom et al. (1971) pro- 
vide such a teacher/innovator with the f ortnative-summativ4 evaluation 
tools that the role requires. One does not quarrel with the tools that 
Bloom et al* provide. 0 This Would be pointless, since the'real quarrel 
is with the premise that the teacher/ innovator must be the exclusive 
force in educational R&D. 



29 



IV 



CAUSE-EFFECT STRUCTURE 

* o 

Needless debate concerning what evaluation work needs to be done 
when — e.g., during the full-scale- tryout, the probationary period, or 
the period following buyer acceptance--can be avoided by differentiating 
cause-effect progressions sufficiently to show which cause of which • 
effects one might evaluated This section more concretely characterizes 
the complex educational product arid cause-effect progressions pertinent 
to complex product evaluation. 



SCHOOL SERVICES - 

The schools provide mandated and elective educational sfervfices, 
using primary structures that are service-specialized and. secondary or 

support structures that apply to two, or more services. 

i ■ . 

Schools .dispense several classes of educational service . Three 
that seem characteristic of the contemporary school are a) instructional 
services, b) enrichment services, and c) child care-socialization ser- 
vices. Each 'of these classes subsumes a set of domain-referenced bene- 
fits. Reading illustrates a particular instructional service. Its 
benefits are reading skills* Observation or discussion of a large 
shopping center illustrates a particular enrichment service. Its bene- 
fits are orienting schemas, whether for exemplary shopping centers or 
for a generalized view of shopping centers. Classroom attentive behav- 
ior and ^behavior in social situations illustrate socialization services. 
The educational effects of a socialization service should be to maxi- 
mize the value of instructional and enrichment services. 

An instructional service is designed to render students first- 
order proficient along each of several proficiency dimensions. An 
illustrative proficiency dimension for reading Is decoding printed 
English monosyllables of specified novelty to speech. The student 
comes to the instructional service slightly proficient in decoding skill 
and should leave the service highly proficient. Whether this happens 
depends on more than the characteristics of an instructional program. c 
It also depends on characteristics of instructional management,, the 
extent to which student and m^nagfer receive apt feedback concerning 
student progress along the decoding proficiency dimension, and other 
factors — some arising within the service and some outside it. 

An enrichment service is designed to broader^ or further differenti- 
ate the universe schema that a student brings to a specified instruc- 
tional service or to Informal learning. An enrichment service may be - 
prerequisite io one or more instructional services.- Alternatively, the 
service may have terminal intent, as when instruction of the survey or 
orientat »,un .variety is given. While proficiency dimensions qf enrich- 

/ • * 

/a 

/ 30- • ' " * . 

/ ; 



26 



ment services have received less attention than those of instructional 
services, such dimensions are ih principle specifiable. They* can be 
defined on scope, differentiation, and organization of conceptual schema* 

" A child care service typically has- two functionsr^~The-M~r&t. t ^_^_ 
custodial baby-sitting, apparently has no educational; content and will 
not be further considered. The second is to advance the child for 
social behaviors that arfe of concern both inside and outside the school. 
If evaluation is confined to manifest behaviors — as opposed to" interna- 
lized antecedents of manifest behaviors — then the behavioral effects of. 
applications of socialization protocols are discernable and quantifiable 
to the extent that their evaluation against criterion behaviors requires. 

If an evaluation team is required to evaluate a particular .educa- 
tional service,, then the other services to which the student is exposed 
provide an intraschool context to evaluation of the particular service. 
One searches for first-order effects of the particular service in stu- 
dent performance along that servicers proficiency dimensions and for 
second-order effects in student performance along prdficiency dimensions 
of the. contextual services. 

Certain of the school's central, services have educational import. 
A system for processing and reporting proficiency test, data to' interested 
audiences is illustrative. Such a system addresses a variety of the 
school's educational services and S9 is denoted a -cross-service com- 
ponent of the school. A cross-service comporientlis useful to the 
extent that it favorably affects student performance along proficiency- 
behavior dimensions for those services that the component serves. If 
we define component first-order effects on a field of first-order 
effects for services served by the component, then designs for evalua- 
ting the cross-service component must be more complex than designs for 
evaluating an educational se'rvi'ce. 

Establishment of higher-order social cause and effect is in its 
infancy. Less well recognized, ' establishment of social cause and 
effect at lower orders is a more, complex business than is typically 
acknowledged, for the educational service or. cross-service component 
is only one antecedent or determinant of a product's first-order .effects. 
These complications are discussed below, first for effects and then for 
antecedents . 

i 

DIFFERENTIATION OF EFFECTS 

Heretofore, the tendency of public agencies < controlling educa- 
tional R&D and of legislatures controlling education has been to con- 
strain the educational services designer or practitioner with regard 
to a subject matter domain but not with regard to the proficiency 
dimensions or criterion levels that" an educational service will nego- 



I { 

I 



tiate or, attaiii. We assume below that the designer ojE an educatjLonal 
service also will be constrained to address certain proficiency dimen- 
sions falling in the skills 3 domainjand charged to design a service 
that transits the student to proficiency levels that are as Shigh as 
applicable states-of-the-art, exploited to the extent jthat operating 
cost specifications aliow, will pfermit, i 

The effects of the educational service, measured along mandated 
proficiency dimensions, are first-order effects of the service.j Noted 
earlier, these really are so-called first-rord'er effects — as can bp Q 
shown using appropriate Multivariate experimental designs^-because they 
may be inflated or deflated by the operation of antecedents othjeflthan 
the> educational service under evaluation. Until we are able tb| separate 
effects of the particular service from other antecedents, effects! of 
every' order will -potentially reflect the play of other antecedents and 
so will warrant the lab.el "so-called. 11 

« > * * 

A first-order effect of a reading instructional 3e v rvice mightj 
occur along a student proficiency dimension for decoding Englisti mono- 
syllables of specified novelty to speech. A first-order elffect jofj a 
mathematics instructional service might occur along a student profi- 
ciency dimension for summing arrays of two-digit numbers of specified 
array length. 1 j I 

A specified educational service might affect student performance 
in some other educational service. ■ Thus, first-order student pro- 
ficiencies resulting* from ^reading instruction might enhance (or ; 
interfere with) first-order proficiencies resulting from mathematics 
instruction. If mathematics instruction is- designed to take reading 
proficiency as prerequisite, then both reading instruction and mathe- 
matics instruction should contribute to Vhat typically- are regarded as 
first-order effects of mathematics instruction. Such effects then 
cion tain a mathematics^tfeferenced first-order component and a readihg- 
ref erenced* v second-order component, or second-order effect . Effective 
reading instruction might also favorably affect social behaviors in the 
school. Conversely, effective socialization services might favorably 
affect reading proficiencies. *When the effect of a specified educa- 
tional servic'e of the school extends to stuclent performance in a second 
domain of educational service of the school, the second-domain effect 
is a second-order effect of the first service. The overall second-* 
^order effect, of a specified educational service is the sum of its x 
second-order effects in all second domains of the school wherein the 
student receives service. If we seek to get at s.econd-brder effects 
somewhat definitively during a full-scale tryout, then summative evatlua- 
tion taust result th£t is more complex than what we typically have in* \ 
mind when considering full-scale tryouts.' It is possible to malce such { 
evaluation increasingly complexly substituting the cross-service com- 
ponent pf a full-servic^ school — e.g., a system that processes pro- 1 
ficiency test data and reports proficiency status of the classroom to] 
interested audiences — for the service product. 



32 



/ 

* . / 

Second-order effects shbw up in the *same student yhoSe performance 
reveals first-order effects? Second-ordeV effects are .geographically 
*and temporally 4ocused in tyhe school* They may be shor\t-term, refer- 
encing to second domains to which the stycLent is introduced concurrently 
or soon after introduction^ to a first domain, or they may be long-term, 
referencing to second domains that the stuaent will not enter for a 
long while. It is likely' that the summativje evaluator will give much 
greater effort to the evaluation of short-tkrm second-orcier effects 
than to longer-term second-order effects, For evaluation! of longer- 
term second-order effects*, poses some of the same data syscemf problems 
that evaluation of Bauer's longer-term ef fects does • , \t 

Summative evaluation that is concerned with the totality of a 
school's educational ^services over year levels heed not distinguish 
between first- land second-ordpr effects as characterized above. Such 
evaluation is legitimate if its purpose is to. support a decision to 
accept or reject the system**© f educational services as a unitary pack- 
age. The trena seems to be in the other direction — to find out what 
elements of a ijarge package are working well and what elements p6prly, 
' so that one canj then proceed selectively when installing educational 
services or directing R&D efforts where most needed . K To , collapse' the 
distinction is to; surrender useful information, ' 

- Educational/ services also may affect student performance or i 
behavior outsid£ the school. In the short term, these services may • 
affect how the /student performs as an individual or in a social sjettingl 
at home or in ,ctie community. In the long term, they may affect hbw* 1 
the student pdrSfjorms socially and economically in adult life, ■ Sufch i , 
effects of educational services are denoted here third-order, eff ects u 
Third-order effects differ from second-order effects by occurring 2 out- 
side the school and so at a geographic locus that is more remote from 
the educatic/nal services to which they reference than are second-ord^r 
effects. However; the primary reason for distinguishing between stu- 
dent performance I and behavior in and outside the school is that the 
dimensions^ and data underlying evaluation of second-order effects are 
(or shoula be) built into ;the school f, s .educational service's, wherelas, 
as with Bauer's long-term effects, the extensive data system that 
underlies evaluation of third-order effects remains to be designed! and 
placed in operation. To evaluate third-order effects,, wk first ne'ed 
£0 decide what performances and b.ehaviors of\the student in home and 
communl/ty are pertinent an^ then secure enough time series data failing 
along these performance-behavior dimensions so that we can distinguish 
between baseline states and' any changes that may result from ihtro-| 1 
ductipn of novel educational services or designs, I j 

/ " ' : ' '] \ 

/A student wholis affectsed by a given educational service or sfe't of 
services might manifest these effects suf ficiently in- home and community 
to jbecome himself the immediate determinant of effects on parents ^nd, I 
siblings in the home and other individuals with whom tie comes into 

f 

/ 



/ 



33 





the lonb 



\ 



the school f )s\ educational 



v 



s 



t i^n \fche community 
sex^i^es mighCy operate, v 
adulfc Viite ahd\on his own 
ourthHorder\e0fects * It 
dim^nsibna? izdtion and datanG^Ae^ that series attemp 

val\uate\ f v o\urVhr6rder" ef Feces 'e S 1 

J VU\! \ .V. \i\uvi vi 

lOnd 1 , can\ frtkglne higher- tWan^^ whdrain ithbse 

aVe \Wurm\py^r affected bjy \e!d^ affehtl others 

fflfeMrothOTi, who Wf feet dihfeVMletcy , WMl^ such remit* effects, 
fducaiionaU ^er^ of the\side ^ffect! 

seVorU^ofde^Acohs^qiiencks Vhatr Scdiven, kna Bajuer haveift Vnd 
ie pari only .imagine, epmin 

r J J ' 



\ro£g|i the y5ud6nt,l on his companions of. 
Children. \vsWh\ effects are denoted ihere 
Wiiube merely repetitive to note the 



[t r okti 
ona^ 
orae 
ai State's 
of\ the op^fationlof 
.students 



societal 



const! 



ate ^pst-ta^rtn^oraejr 
to unders ta^d \ the Hpre 



Contagion 



are V^ansf^r^pfl ;frjOi^ p4rs6ri 
c 1 J d ! .i f | I dc no, 



wher'eiri 

J t s ii 



these \p ; rerejquijsit 
work' o\f! a [general 



fcete EsLjsevereajl! yjearjsnb 

ji : pas e ftinejst joj^er j t im 
a|highe|:-ordGri ipfifect: |( 
ireUion. . Il \j | ! \ 



JCfuisitl^l 



lishi whethe 
so, ( in what 

Given 
would inher 
rent system 
system, as 



one can estlablisji ;balje|i : ne'p 



ibation 
daqal-c( 



a ii'T. un - • ^ 

time|"dbes ifc 
of some 



ponderancelloif ttiel soc Lety; i l So "irifefcv:ed. . _ _ 
macy lof evtiluatio^ at this leyel--^eije! denoted 
The \ problem is that ^w4 krte riot jiet^dejt r ub !lc \ i 
^bsli-f^rfch-lo^dejr ^f^cts^ Bau^ appe^- 



iito bein^ 
ffects of edtic 
■ tOi pe,rspn u 



are 



n\cons;eque 
titonal ' 
Qil| a pre- 



riotlquejsfciori fchfe keg&ti~\ 
lioat-fourlh-o^d^l eflfectq . 



i ; 



^onjetrrin 



tiempt ^sjariouslV \t<b ej/alu-\ 
& clearly than! Sciriven V 
tpxs' sort. , Onfe oi\ t I 
ept^ing within\ the jframe4 
ysdetiu' For 6nly>wh4n 1 
cojnel possible ia es tab- 1 ( 
) has occurred and, 1 



he framework 
t all of tie problems oj 

of eqonomia .ind|Lciitors £6 establish cause and effect.! Tljat 



social! accounting system, We probably 
those who attempt to use th 



cur- 



iae know, permits an economist 'to say that something is hap- 
pening that |is v umisual. Usually^, lots of these" things are happening and 
most of us can say* rather explicitly what they are. The problem is. 
that the system dpes not yet permit jeconomists to speak with one voice 
(or two., or three) concerning what the antecedents, are of the general 
\higher-order effects that the system of economic indicators reveals* 
^ share with Scriven, with Jencks etl al. (1972), and others a concern 
whether education has long-term effects' outside the school -and, if so, 
what form these effects take. However, the tools at hand appear insuf- 
ficient to permit lus to evaluate higher-order or longer-term effects of 
education more than cursorily. ! 

| It is likely that we can extensively evaluate first- and second- 
order effects of specified educational services anytime we choose to 
allocate the needed funds to such an; effort. Third-, fourth-r,* and 
post-f oqrth-order effects (if any) of the school's educational ser- 
vices shquld tend to occur in measurable amounts only if educational 
antecedents moire nearly correspond t<^ the totality of educational ser- 
vices than to *a particular service. !Until we can dimensionalize and 
measure the various antecedents; to these higher-order effects — of 
which the schooling antecedent is only *one, the basis will not exist 



O 1 

ERLC 



34 



30 



for fine-grained! analysis o 1 ^ cause and effect at higher levels. Rather, 
the data system, and the baselines that it affords will predispose us, 1 
like Jencks et al«, to search for large generalizations that are based 
on gross chatacterizations of antecedents and effect characterizations 
- that result from averaging procedures. 



Few probably! would subscribe to the proposition that crudity oi 
tools compels complete inaction in the domain of higher-order ' effects , 
Still, a mass of Recent evidence that is forged from such tools sug- 
gests the possibility that weVaCe creating a false understanding of 
educational higher-orcier ef feats when we use contemporary machinery 
establish educational cause ar^d effect. The weight of this ^evidence- 1 
cf, Stephens (1967)\, Jencks 'et! al. (1972)--suggests thaf~nothing that 
education does mudhi matters, with offstage overtones that perhaps we 

' should accept the null hypothesis in the educational domain. That isi 
when we sample in somewhat o arbitrary ways — consonant with whatever \ 
baseline data happens to be available — from the full domains for ante-\ 

" cedents and consequences and thereafter relate both antecedents and I 
consequents to a population as average values, the. chances are very \ 
good that the generalization will be, reached . that the laws of^ cause ] 
and effect- have been repealed in the educational domain. A single 
event occurring in psychoanalytic space may engender extended trau- 
matic behavior. People may commit suicide or turn to crime } if their 
income is less than half of the national average for a period of time. 
Certain patterns of events occurring prior to i; age 8-13 may predispose 
the child to delinquency and behavior disord(er. The entrepreneur's 
success may predispose him tjo outstanding effort. Great leaders may 
inspire. Yet the great crude studies of educational^ higher-order 
effects tell us that 12 years of education will neither harm nor help 
the individual or the larger society. How novel, quaint, and exotic. 

Crude evaluations of higher-order effects of education probably 
will continue to be conducted Until the basis exists for finer-grained 
evaluations, £or those who believe that we require .sutnmative eyaiua- 
tion at higher levels, the first priority effort should be less to* 
clamor for additional crude evaluations — which inevitably will occur — 
than to seek to devise, install, and perfect the social accounting 
system that, envisioned by| Bauer and' his associates, underlies more 
effective evaluation at higher levels. 



DIFFERENTIATION OF ANTECEDENTS fc j 

Where one seeks to evaluate effects of an educational service, 
then the service itself is a first-order antecedent . If we follow the 
structure provided above for effects, then earlier and concurrent ser- 
vices thkt the student has received and of which tfte- manager is aware 
stand as second-order antecedents to evaluation of effects of a speci- 
fied service. 



ERJC- 35 



The manager is partially programmed by a "routine that leads him 
to understand his role in the service. The manager is a pomppnent of 
the service to the extent that he, performs as the routine specifies. 
He is independent of the service; to the extent that he malperforms or 
transcends provisions of the routine (e.g., by bringing as exceptionally 
favorable personality or ingenuity to bear)*. The student, also has some 
characteristics that the service anticipates and some that it does not 
and so in part is a component of the service and in part independent of 
the service. Momentary events and longer-term characteristics of the 
lives of students and managers outside the school give them service- 
independent characteristics that are denoted here third-order . antece- 
dents . A recent crime,' a war, a long-term socioeconomic condition, or 
the company thatr'a student keeps may enter the school as a student- 
referenced third-order antecedent. Similar events and conditions plus . 
effects of schools of education^ in-sery4,ce training, and exposure to 
general and professional media enter the school as manager-referenced^ 
third-order antecedents. Often it is more effective to directly evalu- 
ate social effects in terms, of student and manager behaviors and per- 
formances that, as third-order antecedents, are brought to the educa- 
tional service than in more abstract terms--e.g., socioeconomic — 
referencing to the community; <- 

*< 

Social ills find their way into the schools on the shoulders of 
third parties when the schools are under grave attack by the community 
or some portion of it.' When evaluation occurs iri a confrontation cli- 
.ma&&t? then it may be necessary to consider third-party effects that 
are here denoted fourth-order antecedents * * 

'The fifthrorder antecedents that come most quickly to mind are 
those that illegitimize evaluation of an educational-service by asking 
it to perform under conditions that are, contradictory , to its design. 
Thus, funding slashes and countermanding administrative directives that 
make it impossible to render a service as designed are fifth-order 
.antecedents that 1 transform a service into a caricature of itself. The 
evaluation team is left then to determirie whether the caricature is 
effective. 

All of the antecedents thus far cited are educational antecedents. 
They enter the schoolhouse through channels 'or outside channels. Abovfc 
we implied that education must have some higher-order effects. That is 
not to argue that a higher-order effect is not also a consequence of 
antecedents that fall outside the educational domain. o Entertainably , 
every effect that might be of interest to educational evaluation is a 
joint function of educational and noneducational antecedents. It 
should be th$ case that lower-order effects defined narrowly on first- 
and second-order educational antecedents would be more a function of 
such antecedents that higher-order effects defined more generally on 
social need, which should give greater play to higher-order educa- 
tional antecedents and to noneducational antecedents • « We.re we to * 
characterize a wide range of noneducational antecedents having shaky 

i 



" 36 • 



32 



baselines and then use gross study methodology t<? determine the con- 
tribution- of each of a totality of antecedents to a higher-order effect 
defined on social nee|d, it would not be surprising to discover that 
lower-order educational antecedents are not the only ones that would 
show up ineffective. Given equally shaky' baselines across the board, 
the, f grand conclusion should fall out that .nothing really matters. To , 
the lex tent that Jencks et al. can point a finger at .anything, they 
point to an antecedent whose baseline. data is not shaky. 

„j When the school^ are not under grave attack and when school admini- 
strators have the resources and display the determination to operate 
an educational 'service as designed, ruminative ^valuation probably needs 
to consider only, first-, second-, and third-order antecedents of lower- 
order effects. All of these antecedents can be evaluated at the locus 
of the service Where the school is under attack, it may be necessary 
also to consider fourth-order antecedents? * v \ * • 

Where the school is not under attack, then so-called first-order 
effects are a function of first-, second-, and third-order antecedents. 
Those theorists. who are attracted to higher-order effects might pause 
to consider how tenuous such an enterprise becomes when we are forced 
to. admit that not even first-order effects can adequately be accounted 
for simply by referencing them to a first-order educational antecedent. 
It appears warranted that the domain of antecedents will expand — and 
more than linearly-r-as one mounts the order scale for effects. 

v o 

If mountains are there to be climbed, then we will ascend the 
mountain of summative evaluation. However, the state-of-the-art for 
evaluation of social cause and effect* is such that a practitioner of 
evaluation cannot hope .to get 1 much higher than a first base camp* at 
present. The- higher levels should for the v most part during the next 
decade be the* province of those whose interests ard talents are con- 
sonant with advancing state-of-the-art for evaluation of social cause 
and effect • - - 

c 

EXEMPLARS OF LOWER-ORDER EFFECTS EQUATIONS \ 

\ Most -references to evaluation thus far made have assumed a first- 
order antecedent that takes the-form of a specif fed. educational .ser- 
vice. When the first-order antecedent is such a service, then first- 
orcjer effects are evaluated in terms of student performance along 
perjtinenty dimensions for the service. .These dimensions/ might be 
enumerated in consequence of the joint efforts of a development staff, 
a sponsoring public agency, and consultants who are available to both. 

• I • . , . ■ • 

' Alternatively, the first-order antecedent migjht be* v a cross-ser- 
vice component of the full-service school. An example is a system 
that insures the flow of first-order data for the different services to 
all interested audiences--e.g. , service managers', administrators, and 



37 



; J 

parents. Such a system might be used to , process, organize, and report 
progress, by individual and class, for each of the school's instructional 
services. In this evenjt, the first-order effects of all instructional • 
services, might be established when the system is in use and when an 
alternative first-opder antecedent replaces the system (e.g., whatever 
is customary, including the de£ault system whereby no first-order data 
flows from the classroom concerning any instructional service). The 
system's first-order effects then are established lyr comparing* it with 
an alternative (including nil). system for effects on first-order effects 
of the different services. , Positive first-order effects* of the system 
are, of course, further evaluated within a cost-return ^framework. 

When the first-order antecedent is a specified educational ser- 
vice and the second-order antecedent is one or more second-domain 
serviced, then the secotid-order effects of the first-order antecedent 
show* up in the first-order effects of the ^second-domain services. 
Where the intent of the, first-order .antecedent "is to supplant a pre- 
vailing version of the service whose objectives are to transit the 
student along identical proficiency dimensions, then second-order 
effects of the new service relative to the prevailing service can be 
established comparatively. , If the new service has more desirable 
secdnd-order effects on second-domain services than does the prevailing * 
service, then the first-order effects of second-domain services will 
be more desirable when the new service is the first-order antecedent 
than when the prevailing service is. It will tend to be the case that 

a new service that has an edge in second-order effects but performs - 

in an inferior way for first-order effects will prove unacceptable. 
Cost-return considerations apply to all' examples. 

When the first-order antecedent is a cross-service component of* * 
the full-service school, evaluation^ of second-order effects requires a 
greater* investment than when a cross-service component is evaluated * 
for first-order effects or a service is evaluated for second-order 
effects. The niceties ignored — e.g., random block design — a* two-factor 
factorial design is required- when the taslc is to evaluate second-order 
effects of a cross-service component. To evaluate such effects, the 
evaluation design' must reflect alternative Versions of the cross-ser- 
vice component and, as a minimum, alternative versions of a specified 
service. - The' difference between second-domain first-order effects of 
the two versions of the service for one version of the cross-service 
component are second-order effects for the new version of the service. 
The difference between these second-order effects of the two versions 
of the cross-service component are second-order effects "for-the-new. - 
version of the component. , * * 

Higher-order educational antecedents ignored, evaluation design 
increases in complexity as one moves from the service to the cross-" 
service component as a first-order antecedent and from /first- to second- 
order effects. Even when antecedents are viewed narrowly in terms of 




first- and second-order domains, the complexity of apt evaluation 
designs mounts with elevation of interest to higher-order effects. 
Evaluations of the sort sketched -above may be appropriate to the full- 
scale tiryout^and to probationary installation-op€>ration o antedating 
buyer acceptance. It is doubtful that evaluations of higher-rorder 
effects could occur prior to buyer acceptance of the service, set of 
services, cross-service ^component, or set of cross-service .components* 

* There is virtually no way to systematically vary third-order, 
educational antecedents without deliberately reforming the society ~*to 
conform to provisions of one's experimental design* Antecedents above 
second-order typically can be varied only in the fortuitous sense of 
selecting schools in different socioeconomic neighborhoods or, having 
other characteristics defined on demographic central tendencies.* When 
this. is done, the characterization of a statistical treatment group* 
tends itself to be no more than "a hypothesis concerning what charac-r 
teristics of the third-order antecedent are pertinent . Moreover, the 
characterization, if a central tendency, might come near to describing 
an empty set--with people around it but no one actually there. Pro- 
blems ' do obese t us when we harken to Scrivenls laudatory call to become 
more ambitious in the educational "evaluation domain. Surely in depth 
we must. In scope the constraints noted above presently preclude 
much- change. 



39 



0 



4 ' 



ENDS AND MEANS' 



A DECISION PERSPECTIVE FOR SERVICE PRODUCTIVITY ■ " , 

A multiyear service structure whose function is to transit stu- 
dents ^along specified outcome dimensions will do so productively, to 
the extent that structure is consonant with productive .function * A 
first general objective of an educaton^al R&D program is to create the 
% wherewithal for a service whose theoretical productivity, is as high 
as we could hope to make it' in light of applicable states-of-the-art 
.and controlling educational cost constraints^ A service's theoreti- 
cal productivity is what it would do per unik'cost if all of its com- 
ponents — and particularly its personnel--perrorm up to capability/ c 
When the service is installed' on a probationary basis, its achieved 
productivity — evident in its performance — should fall below its theo- 
retical prdductivity for a number of reason^- — some referencing .to per- 
sonnel and others to other components of the ^service. A second general 
objective* is to bring, achieved productivity into line with theoretical 
productivity. C| 

Given the present state-of-the-art, it is necessary to move toward 
definitive characterization of a service's theoretical productivity 0 
concurrently with efforts O to optimize the operating service's achieved 
productivity (see Follettie, 1972). We currently lack the technology 
to examine personnel, students, and other characteristics of a design- 
form service and in consequence specify mean and dispersion values for 
students transiting' the service. The operating service must.be used 
to*dLetermine the transit rate distribution that the service compels. 
Perhaps one reason why school personnel characteristics are taken as 
falling' outside the bounds of the educational engineering effrfrt in * 
some accounts is that- this view nicely resolves the problem of dis- 0 • 
tinguishing between theoretical and achieved productivity. There ^is 
no such problem if we are stuck with whatever performance school 
personnel care to prcJvidel A second related Q way to avoid the problem 
* is by requiring summative evaluation to be comparative evaluation. At 
its best, comparative summative evaluation is predicated on the same 
level of performance by personnel of the old ,and new versions of a ser- 
vice. The present paper treats educational service personnel as inte- 
gral components of service structure and so as having performance cap- 
abilities that can be estimated* under appropriate empirical conditions. 
This view reintroduces the problem of; distinguishing between theoretical 
^and achieved productivity and makes summative evaluation of lower- 
order effects more dynamic and difficult than many yet concede* 

A scenario sketch may clarify^ the character of ttje complexities 
involved when -a multiyear servicers to be evaluated for lower-order 



40 



ERLC 



36 



effects* Let us imagine that a sponsor charges a development organiza- 
tion with developing the wherewithal for a service (or a version of 
the service that is alternative to a prevailing version) that will use 
six instructional years for 30 .minutes per instructional day and will 
transit entering first graders over specified service dimensions as 
state-of-the-art-optimally as specified operating costs for the service 
allow* Allow the development staff three years to develop the product 
to the point where it is ready for a full-scale tryout, the development 
staff and an independent evaluation team one year fbr a full-scale try- 
out conducted under the condition of simultaneous installation across 
year levels for the service, and the development staff and evaluation 
team six years for probationary operation wherein the service transits 
an entering first grader from his point of enti;y to an exit, ppint that 
is ^rate-optimal for the student/ 

. During the full-scale tryout, the evaluation team should be col- 
lecting data near-continuously and passing along to the development 
staff any findings that might be pertinent to modifying the service as 

•a prelude to probationary installation* * If the product in tryou£ form 
is unpromising, that might be the ertd of it. * If it is promising, then 
tryout evaluation might yield data — quite possibly incidental to first- 
order effects evaluation — tha^t suggest how. the product might T>e made 

"more promising still* In this event, there would be lxt<tle point in 
the development staff keeping its hands off the product whetfher in the 
sense of the installed service under evaluation or the form it will 
rake during probationary operation. Small modifications should be in 
order. . 

During probationary operation, we might again, think in 'terms of 
defined data flow that the Valuation team monitors and passes along 
to interested audiences, including the development staff, whose respon- 
sibility would be to fine J turi<**$toe 'product- to optimize it for pro- ^ 
ductivity* 

The setting of productivity standards, the decisions to install 
on a probationary basis and to accept the product, and evaluation that 
most aptly serves these decisions could all be made* easier if we were 
to allow the full-scale tryout to use six, years — and would become more 
straightforward still if it were possible where necessary to repeat 
the six-year tryout ♦ However, unless applicable states-bf-the-art are 
advancing much more .slowly than we imagine, a service would be hope- 
lessly outdated before reaching probationary installation with such a 
generous^ full-sckle tryout period* Such problems do not, arise when / * 
the product to be* evaluated is a simpler one having much shorter 
theoretical transits. If all involved- are willing to be flexible, 
then the one-year full-scale tryout buys problems that we can afford* 

A full-scale tryout inevitably occurs under conditions that are 
not isomorphic with the designed-for situation* Thus, it may be found 



41 



37 



useful- during a tryout to. minimize personnel malperf ormance through 
overtraining ^and overevaluation or by utilizing R&D personnel in 
selected positions. It usually will prove necessary to install a 
multiyear service simultaneously across year levels for tryout pur- 
poses. This installation strategy rules out that entrants to higher- 
year levels will be graduates of the service V lower-year levels, which 
typically contradicts design specifications for the multiyear service* 
Hence, first-order effects obtained in the tryout situation will depart 
somewhat from what would be expected if the, ^product were longitudinally 
installed in the operating setting and used available personnel trained 
according to provisions of the service specif ications • The purpose of. 
a .full-scale . tryout , t^Jhen, is to test €he promise of the product in 
operational use. ^ Evaluation in the tryout setting serves a decision 
to install or not install the product for a probationary period'* The 
tryout provides a basis for distinguishing between theoretical and 
achieved productivity under conditions that depart from design specifi- 
cations, wjiere some of these departures are favorable to service per- 
formance and others are not. Definitive standard-setting can only 
occur during probationary operation. * 0 

A decision < reached to install the product for a probationary 
period should usher in summ„ative evaluation that definitely establishes 
first- and second-order eff4cts. This evaluation effort is in support 
of a decision to accept or not accept the product as designed Until 
a) the service becomes obsolete or b) apprehended higher-order effects 
suggest a need to modify or supplant the service • Throughout the pro- 
bationary period,, some staff should be progressively modifying standards 
defining theoretical productivity toward definitive standards that, 
representing a fair contract, will characterize achievable productivity • 

A decision reached to accept the product might usher in summative 
evaluation that establishes higher-order effects of the service or of 
the full-service school. Bauer's views on social cause and effect 
suggest that s.ummative evaluation at "this level would need be general 
and so defined on broader social need than the school alone ever could 
address. 

Th6 apparent ultimate consequence of Scriveri's views on summative 
evaluation of higher-order effects is that such an evaluation would 
reference narrowly to a specified simple educational product. Since 
the educational antecedents to higher-order effects all are pertinent 
to these effects, the full-service school probably provides a better 
b^sis for the antecedent , referencing of higher-order effects evalua- 
tion than does some one facet of its, educational ^effort* With so 
many arguing thap formal education has no measurable higher-order 
effects, who could believe that some small part of it could* Policy 
science may not be dismayed by Scriven's proposition ♦ An empirical 
science would have to be. ' 

v. S 

i 



42 



38 



Scriven 1 s "consumer orientation 11 to summative evaluation should 
compel him eventually to return to the lower-order effects evaluation 
Momain that the earlier Scriven charted. When costly .and complex pro- 
ducts are to be evaluated at this level, a host of ' difficult problems 
remain to resolve before the work -can be considered technically routine. 
Something like the progression of summative-f ormative interactions, that 
are sketched above appears necessary ^because complex educational pro- * 
ducts perform over extended time. All of this .performance is pertinent 
— both to sponsor decisions and to subordinate decisions by develop- 
ment staffs. Grand summations having no interim import can only occur 
when, the product is of modest proportions. Who much cares when that 
is the case? . * 



DEVELOPMENT -EVALUATION* CONTRACTING PROCEDURE 

-* ' 

In the NIE noteS v , Scriven is centrally concerned with conflicts of 
interest that may arise if one^allows a developmentr-staf f"*to Vummatively 
evaluate the product^eveloped,-' That concern is apt, . However , ? Scfiven 
quickly, goes from there to the idea of outside evaluation teajns to 
whom the sponsor gives full discretion to decide what the pertinent 
^dimensions of evaluation are at <every order level, what order levels 
are pertinent, etc. Essentially, Scriven Advocates giving carte blanche 
to the evaluation team, presumably on the basis that private such groups 
with their individual proprietary .interest are the nation 1 s best source 
of consumer protection Ivanhoes (a characteristic that_ is additional to 
Leonardoesque proficiencies). Carte blanche threatens every society that 
gives it to .any small group — whether elected, appointed, or self-appointed. 
Scriven .is .^correct to seek to remove conf licts-of-interest temptations 
from development staffs. These must be removed from all Sources of 
participation in the R&D enterprise. However, Scriven merely -transfers 
a license to steal from one group to another. 

While one can accent the view that, others can evaluate a brain- 
child in a more disinterested way than can its creator, it does not 
follow that an outsider is more competent to perceive an apt design 
for evaluation than is an insider . Merton (1972) nicely responds to 
the view that location outside aji -operation somehow guarantee's objec- 
tive purity. According to Merton, "The role of the Outsider no more 
rguarantees emancipation from* the myths of a collectivity than the role 
of the Insider guarantees unfailing insight ipto its ^social life and 
belief-systems." I would make the statement bidirectional. Different 
points of view are useful because they are predicated on different bias- 
ing premises, rather than because of some of theirf transcend personally- 
referenced hangups. As with so many mac rod imens ions of<life, the extreme* 
values of solipsism and naive . realism tend not to be utilitarian alter- 
natives td a middle ground that x makes us all observers who can pnly to 
some extent overcome the shackles of personal experience to gain inter- 
subjective views that many can share. I concur withMerton's view that 



43 



c 



the search for truth will best be served if insider and outsider inter- 
act with each other during the quest* However, when this happens, then* 
insider-outsider terminology becomes less descriptive because we then 
■bring all those with pertinent views in some sense into the same tent* 
The issue concerning who conducts a specified evaluation (or execute's 
its design) is separable from th(e issue concerning who designs the 
evaluation (or the issue concerning how it is designed) . When we 
separate these issues*, conflicts of interest are diminished or removed 
and the stage is set for bringing tfo bear the different pertinent points 
of view whose joint consideration insures that a strong evaluation 
design will be produced . 

I 

All interested and qualified parties may participate in the design 
of summative evaluations—whether classical or in. the extended sense 
sketched earlier. , Scriven's notion that an independent evaluation team 
might,, by becoming aware of proficiency dimension specifications advo- 
cated by a development staff, in some sense be contaminated., is no more 
than the notion that some individuals are quite impressionable or easy 
to dominate/ A broadly-based design effort surely will embrace some 
such individuals. However, the more-likely consequence of a broadly- 
based effort is a "Tower of Babel 11 of idiosyncratic points of view that 
refuse to consider other pertinent -points of view. (The problem of 
development staff itiidsyncracy disappears when we agree to extend the 
summative evaluation concept down to formulation of product specifi- 
cations.) 

Responsibility for executing designee}, summative evaluations may be 
discharged according; to normal contract-letting, and contract-monitoring 
procedures of the sponsor. The contracting organization should have a 
track record that indicates competence in areas specified by the design- 
reflecting contract. The organization should manifest no conflict of 
interest. Were there a market for contracted' evaluations, it is likely 
that private industry quickly would evidence the required evaluation 
capability — assuming that comment.s as are scattered throughout the. 
present paper first sketch the pertinent state-of-the-art; 



VI 

f 

CONCLUDING REMARKS • 

Evaluation efforts are separable from development efforts-rand 
increasingly economically separable as product complexity increases. 
Earliest evaluations address relevance issues — whether intuitively or 
empirically-«-and the question concerning how much proposed product 
specifications exploit applicable states-of-the-art consonant with 
imposed bounds for operating costs. Later evaluations address product 
productivity as defined on lower-order and particularly first-order 
effects and longer-term relevance-productivity as defined on higher- 
order 'effects. 

4 Whether we should' bend format ive-summative evaluation terminology 
to the description of a foraulation-development-postdevelopment pro- 
gression of evaluations — as I have done--is not an important . issue. 
All such evaluations might be viewed as sutnmative from the standpoint 
of a responsible sponsor and as formative from the, standpoint of a 
development organization that is charged with modifying the evaluated 
work. The sponsor and the development organization can be viewed as 
joint consumers of the same set of evaluative findings. Particularly 
when the product is complex,- finHfhgs- that are of primary interest to 
one of these consumers often will prove no less than of secondary inter- 
est to the other. 

Contracted independent evaluation seems required for all evalua- ' 
tions conducted for a sponsor. Entertainably, 'the best interest of a 
development organization also will be served by independent evaluators 
'working under contract. The evaluation organization can no moire be 
given carte blanche concerning what work it will do than can the develop* 
ment organization. Once we extend evaluation down to product formula- 
tion activities, the notion that th6 dimensions for proficiency-behavior 
evaluation are idiosyncratic inventions of the development organization' 
or staff loses^" a^l credibility. At* that point, so does Scriven's 
notion that the evaluation team should have carte blanche. 

The two-stage educational R&D perspective that gives rise to the 
classical view of a formative 'evaluation period followed by a summative 
evaluation period is that of "little educational R&D." When the pro- 
duct is viewed as incorporating much or all of the educational structure 
of the school, brought to bear on an atppreciable portion of the school's 
functions, we reach a large scale level of concern that small diverse 
R&D efforts cannot effectively address. "Big educational R&D" then 
becomes appropriate. Such effort need not be monolithic and can be. 
spared this fate if the sponsor is required to act responsively and 
responsibly and so to involve all communities that have a contribution 
to make. It is not inevitable that a system, of large organized edu- 
cational R&D strive, like the French Third Republic, to. orient each 



1 

L 



penciled hand to the .same point in a national educational program at 
a given instant in time. The Soviot Union incurs penalties in important 
areas because its education is not appropriately balanced along the con- 
formity-independence dimension (cf, Bronf enbrenner , 1970). We can hope 
to have large-scale -educational R&D that is not the pawn of bureau- 
cratic tyranny because such strangleholds on commerce increasingly will 
pose both internal and external threads to the nation* We must' address 
educational problems at their level of complexity and accept the result- 
ing challenge to so organize "ourselves that no one group can dominate \ 
the enterprise. . - 

That professional people engage in professional activity is nine- 
teenths myth. Most professional peopie travel previously-plowed ground^ 
most of the time and so operate> as technicians much more often ith&n as 
professionals. Technical work can be specified although, /varying in 
complexity, more easily in some, cases than others. The notion that 
teachers — like it or not — will be the final arbiters of practice is 
understandable. They are only ' mimicking politicians, doctors, lawyers , 
professors, educational k&D personnel, and others now shaded by the 
umbrella of pr of essional* mystique. The technical efforts of all pro- 
fessionals — including teachers--should increasingly come under pro- 
visions of a social doctrine of accountability * Technical .educational 
practice should be evaluated .** Much of the effprt of "big educational 
R&D M might usefully be devoted to providing a framework that insures 
chat service personnel are evaluated equitably and fairly. This 
requirement alone destroys the classical oversimplification of evalua- 
tion* as a two-stage t ormative-summative categorization* 

"■.«*' * 
Scriven's concerns with higher-order effects of education are 
^gitimate but appear too narrowly referenced #nd premature excepting 
in the policy science sense of evaluation* We will not get any other 
kind of evaluation of higher-order effects until a system of social 
indicators such as is sketched by Bauer and his associates is developed, 
evaluated, and appropriately institutionalized.* 

Like* it or not, we currently are stuck with comparative-relative 
evaluation for every evaluation requirement save one — first-order 
effects evaluation of the prbduct arid, through it, of the development 
staff. Entertainably , comparative-relative evaluation sometimes will 
be appropriate yhen first-order product effects require evaluation* 
However, one stiff ers this state of affairs, rather than champions 
it, for it typically reflects a cop-out concession to a defective 
educational status quo. 



-.46 



43 ... " ! 



REFERENCES 

Atkin, J. M. & Grotelueschen, A. On changing educational practice. 
Paper prepared for the Syracuse University Research Corporation 
Policy Institute. December 10, 1971. 

• « 
Bauer, R. A. Detection and anticipation og. impact: The nature of the 

task. In R. V A. Bauer (Ed.), Social? indicators . Cambridge, Mass.: 

MIT Ptess, 1966. ' 

Bloom, B.S., Hastings, J.Ty£ & Madaus, G.F. Handbook on formative and 
summative evaluation of student learning . New York: McGraw-Hill, 
1971. 

Braybrooke, D. & Lindblom, C.E. A strategy of decision . tfew N York: 
Free Press, 1963.' . : J 

Bronf enbrenner, U. Two worlds of childhood: U.S. and U.S^S.R . New 
York* Russell Sage Foundation, 1970. 

Follettie, J. F. Alternative designs for educational systems. 

Technical Report No.' 45,' 1972, SWRL Educational Research and * 
Development, Los "Alamitos , California. 

Jencks, C. J., Smith, M. , Aclahd, H., Bane ^ M.J. , Cohen, D., Gintis , llf, 
Heyns, B., & Michelson , S. Inequality: ,A reassessment of the 
effect of family and schooling In America . New York: Basic Books, 
Inc.\ 1972. , . 
• « * * 

Land, K. C Social indicator models: An overview. Paper presented at 
AAAS Meeting, Washington, D. December* 26, 1972.^ 

Light, R- # J. Smith, P. V. Choosing a future: Strategies for design- 
ing and evaluating * new programs. Harvard. Educational Review » 
1970, 40, 1-28. 

Merton, R.K. Insiders and outsiders. University Lecture delivered at 
Columbia University, November 27, 1972. Cited in: Does it take 
one to know one? Columbia Reports , January 1973, p. 2. 

V 

Price, D. J. de S. Little science, big, science * New York: Columbia 
University Press, 1963. 

Scriveh, M. The methodology of evaluation. In R. W. Tyler, R.M. Gagne, 
& M. Scriven, Perspectives of curriculum evaluation . Chicago: 
Rand McNally, 1967.* • ■ 

Siegel, L., h Siegel, L. C. A multivariate paradigm for educational 
research. Psychological Bulletin , 1967, 68, 306-32'6. 

Stephens, J. M. The process of schoolings New York: Holt, Rinehart & 
' Winston, 1967. ■ 



47 * 



